Re: [HACKERS] Postgres Crashes on Win2K
Hi The O/S is Win 2K Server SP4 and PG version is 8.0. Here's the log file from the latest incident: 2006-06-16 09:40:01 LOG: 0: server process (PID 4744) was terminated by signal 125 2006-06-16 09:40:01 LOCATION: LogChildExit, postmaster.c:2335 2006-06-16 09:40:01 LOG: 0: terminating any other active server processes 2006-06-16 09:40:01 LOCATION: HandleChildCrash, postmaster.c:2228 2006-06-16 09:40:02 WARNING: 57P02: terminating connection because of crash of another server process 2006-06-16 09:40:02 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2006-06-16 09:40:02 HINT: In a moment you should be able to reconnect to the database and repeat your command. 2006-06-16 09:40:02 LOCATION: quickdie, postgres.c:1890 2006-06-16 09:40:03 LOG: 0: all server processes terminated; reinitializing 2006-06-16 09:40:03 LOCATION: reaper, postmaster.c:2127 2006-06-16 09:40:04 LOG: 0: database system was interrupted at 2006-06-15 20:03:42 GMT Daylight Time 2006-06-16 09:40:04 LOCATION: StartupXLOG, xlog.c:4054 2006-06-16 09:40:04 LOG: 0: checkpoint record is at 0/3C5B5350 2006-06-16 09:40:04 LOCATION: StartupXLOG, xlog.c:4123 2006-06-16 09:40:04 LOG: 0: redo record is at 0/3C5B5350; undo record is at 0/0; shutdown FALSE 2006-06-16 09:40:04 LOCATION: StartupXLOG, xlog.c:4151 2006-06-16 09:40:04 LOG: 0: next transaction ID: 2991949; next OID: 182344 2006-06-16 09:40:04 LOCATION: StartupXLOG, xlog.c:4154 2006-06-16 09:40:04 LOG: 0: database system was not properly shut down; automatic recovery in progress 2006-06-16 09:40:04 LOCATION: StartupXLOG, xlog.c:4210 2006-06-16 09:40:04 LOG: 0: record with zero length at 0/3C5B5390 2006-06-16 09:40:04 LOCATION: ReadRecord, xlog.c:2487 2006-06-16 09:40:04 LOG: 0: redo is not required 2006-06-16 09:40:04 LOCATION: StartupXLOG, xlog.c:4312 2006-06-16 09:40:04 LOG: 0: database system is ready 2006-06-16 09:40:04 LOCATION: StartupXLOG, xlog.c:4517 Thanks Ch. Copy addresses and emails from any email account to Yahoo! Mail - quick, easy and free. Do it now...
Re: [HACKERS] Postgres Crashes on Win2K
On Thursday 25 May 2006 10:47, chelsea boot wrote: > Hi > > Can anyone offer advice on this please: Intermittently a postgres process > appears to crash and the postmaster service needs restarting. The > following is entered in the log: > > 2006-03-15 09:50:03 LOG: server process (PID 348) was terminated by > signal 125 2006-03-15 09:50:03 LOG: terminating any other active server > processes 2006-03-15 09:50:06 FATAL: the database system is in recovery > mode 2006-03-15 09:50:06 LOG: all server processes terminated; > reinitializing 2006-03-15 09:50:06 FATAL: the database system is starting > up > 2006-03-15 09:50:06 LOG: database system was interrupted at 2006-03-15 > 00:16:39 GMT Standard Time 2006-03-15 09:50:06 LOG: checkpoint record is > at 0/C1F3E48 > 2006-03-15 09:50:06 LOG: redo record is at 0/C1F3E48; undo record is at > 0/0; shutdown FALSE 2006-03-15 09:50:06 LOG: next transaction ID: 651022; > next OID: 113846 2006-03-15 09:50:06 LOG: database system was not properly > shut down; automatic recovery in progress 2006-03-15 09:50:06 FATAL: the > database system is starting up > 2006-03-15 09:50:06 LOG: record with zero length at 0/C1F3E88 > 2006-03-15 09:50:06 LOG: redo is not required > 2006-03-15 09:50:06 FATAL: the database system is starting up > 2006-03-15 09:50:06 LOG: database system is ready > > This is accompanied by 2 postgres errors in the Application Log of the > Win2K server: (1) "the application failed to initialize properly > (0xc142). Click on OK to terminate the application" (2) "the execption > unknown software exception (0xc0fd) occurred in the application at > 0x7c59bd01. Click on OK to terminate the application". > > Alternatively I get the same messages as above but with an additional > WARNING: > > WARNING: terminating connection because of a crash of another server > process DEATIL: The postmaster has commaned this server process to roll > back the current transaction and exit, possibly because another server > process exited abnormally and possibly corrupted shared memory. > > What is the signal 125 and how can I troubleshoot what is causing the > process to crash as the error occurs at random times eg. 8pm when no-one is > using the network and no utility is running such as AV or backup or during > the day? > > I've posted to novice and ports but have been unable to find a solution. > Try setting log_error_verbosity to verbose and sending that info in along with OS and PG Versions. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Postgres Crashes on Win2K
Hi Can anyone offer advice on this please: Intermittently a postgres process appears to crash and the postmaster service needs restarting. The following is entered in the log: 2006-03-15 09:50:03 LOG: server process (PID 348) was terminated by signal 125 2006-03-15 09:50:03 LOG: terminating any other active server processes 2006-03-15 09:50:06 FATAL: the database system is in recovery mode 2006-03-15 09:50:06 LOG: all server processes terminated; reinitializing 2006-03-15 09:50:06 FATAL: the database system is starting up 2006-03-15 09:50:06 LOG: database system was interrupted at 2006-03-15 00:16:39 GMT Standard Time 2006-03-15 09:50:06 LOG: checkpoint record is at 0/C1F3E48 2006-03-15 09:50:06 LOG: redo record is at 0/C1F3E48; undo record is at 0/0; shutdown FALSE 2006-03-15 09:50:06 LOG: next transaction ID: 651022; next OID: 113846 2006-03-15 09:50:06 LOG: database system was not properly shut down; automatic recovery in progress 2006-03-15 09:50:06 FATAL: the database system is starting up 2006-03-15 09:50:06 LOG: record with zero length at 0/C1F3E88 2006-03-15 09:50:06 LOG: redo is not required 2006-03-15 09:50:06 FATAL: the database system is starting up 2006-03-15 09:50:06 LOG: database system is ready This is accompanied by 2 postgres errors in the Application Log of the Win2K server: (1) "the application failed to initialize properly (0xc142). Click on OK to terminate the application" (2) "the execption unknown software exception (0xc0fd) occurred in the application at 0x7c59bd01. Click on OK to terminate the application". Alternatively I get the same messages as above but with an additional WARNING: WARNING: terminating connection because of a crash of another server process DEATIL: The postmaster has commaned this server process to roll back the current transaction and exit, possibly because another server process exited abnormally and possibly corrupted shared memory. What is the signal 125 and how can I troubleshoot what is causing the process to crash as the error occurs at random times eg. 8pm when no-one is using the network and no utility is running such as AV or backup or during the day? I've posted to novice and ports but have been unable to find a solution. Many Thanks Ch.Send instant messages to your online friends http://uk.messenger.yahoo.com
Re: [HACKERS] Postgres Crashes
After reproducing these crashes and running tests long enough, we found that these Postgres Crashes happen on linux 2.4.18 and not on linux 2.4.25. In all likelihood, this is a kernel (or driver) issue. Could you kindly ensure this gets on my mail thread with subject "Postgres Crashes". Thanks Prem. On Wed, 5 May 2004, Tom Lane wrote: > [EMAIL PROTECTED] (Prem Gopalan) writes: > > The dying process is postmaster. After these crashes the server is > > missing from ps and no more new connections are possible. The backend > > processes stay on till their connections close. > > That behavior does sound like a postmaster crash --- but all the stack > traces you show are clearly in backend code. A backend crash ought not > take out the postmaster. So something fairly odd is going on here. > > What if anything shows up in the postmaster's stderr log when this > happens? > > regards, tom lane > ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Postgres Crashes
This sounds very much like a memory problem. I would replace all of the memory with another set of (preferably known good) memory and see if the problems persist. Also look for other cores that may be dropped. If there are several, memory is the likely cause. Be aware that it will likely be active, large memory applications (of which PostgreSQL may be the only one on the server) that will materialize the issues. Memory testing application may also show the problem, however, they do not test like production use. I have had test apps run for weeks where production use can cause failures in mere minutes. Also, note that I have seen issues with bad CPU's (bad cache?) that have caused similar problems. On 30 Apr 2004, at 15:24, Prem Gopalan wrote: We run a multithreaded application that uses postgres 7.4 on Linux 2.4.18, dual cpu Xeon processor machine. We have occassional weird crashes and have tried a lot of things to reproduce them in house, but in vain. We do have coredumps and I have listed the backtraces and their common characteristics here. ... ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Postgres Crashes
[EMAIL PROTECTED] (Prem Gopalan) writes: > The dying process is postmaster. After these crashes the server is > missing from ps and no more new connections are possible. The backend > processes stay on till their connections close. That behavior does sound like a postmaster crash --- but all the stack traces you show are clearly in backend code. A backend crash ought not take out the postmaster. So something fairly odd is going on here. What if anything shows up in the postmaster's stderr log when this happens? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Postgres Crashes
Prem Gopalan wrote: > We run a multithreaded application that uses postgres 7.4 on Linux > 2.4.18, dual cpu Xeon processor machine. We have occassional weird > crashes and have tried a lot of things to reproduce them in house, but > in vain. We do have coredumps and I have listed the backtraces and > their common characteristics here. Whether your client is multi-threaded or not should have no affect on the postmaster and any crashes you see there. This part of the backtrace seems significant: > #3 0x081a767e in elog_finish (elevel=20, fmt=0x8235680 "invalid > memory alloc request size %lu") I wonder if you are allocating too much memory. Looking at the 7.4 code I see these all as ERROR, not FATAL (backend exits) or PANIC (postmaster exits), so it shouldn't be crashing anything: ./backend/utils/mmgr/mcxt.c:elog(ERROR, "invalid memory alloc request size %lu", ./backend/utils/mmgr/mcxt.c:elog(ERROR, "invalid memory alloc request size %lu", ./backend/utils/mmgr/mcxt.c:elog(ERROR, "invalid memory alloc request size %lu", ./backend/utils/mmgr/mcxt.c:elog(ERROR, "invalid memory alloc request size %lu", Would you send over a backtrace that shows more levels above this? Can you reproduce this crash on demand? I can't imagine why you would get this error. I wonder if you have a problem with bad memory on that machine? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[HACKERS] Postgres Crashes
We run a multithreaded application that uses postgres 7.4 on Linux 2.4.18, dual cpu Xeon processor machine. We have occassional weird crashes and have tried a lot of things to reproduce them in house, but in vain. We do have coredumps and I have listed the backtraces and their common characteristics here. Briefly, the last frame is a call to a glibc (or rarely some other shared lib) method.And the instruction pointer points to an indirect jmp instruction to the shared lib method. Almost all coredumps show this characteristic. The dying process is postmaster. After these crashes the server is missing from ps and no more new connections are possible. The backend processes stay on till their connections close. Any ideas appreciated. core #1 (8706) --- (gdb) bt #0 0x0806f1c4 in snprintf () #1 0x081a7f50 in send_message_to_frontend (edata=0x826c1e0) at /root/src/postgres/src/backend/utils/error/elog.c:1239 #2 0x081a6c85 in errfinish (dummy=0) at /root/src/postgres/src/backend/utils/error/elog.c:359 #3 0x081a767e in elog_finish (elevel=20, fmt=0x8235680 "invalid memory alloc request size %lu") at /root/src/postgres/src/backend/utils/error/elog.c:853 (gdb) disassemble Dump of assembler code for function snprintf: 0x0806f1c4 :jmp*0x823f02c 0x0806f1ca :push $0x508 0x0806f1cf : jmp0x806e7a4 <_init+24> End of assembler dump. (gdb) x/i $pc 0x806f1c4 : jmp*0x823f02c (gdb) x *0x823f02c 0x182de130 : push %ebp (gdb) disassemble *0x823f02c Dump of assembler code for function snprintf: 0x182de130 :push %ebp 0x182de131 :mov%esp,%ebp 0x182de133 :push %ebx core #2 (5889) --- (gdb) bt #0 0x0806f0b4 in memcpy () #1 0x08103cee in pq_getbytes (s=0xbfffeb5c ".", len=4) at /root/src/postgres/src/backend/libpq/pqcomm.c:748 #2 0x08103e04 in pq_getmessage (s=0xbfffec10, maxlen=0) at /root/src/postgres/src/backend/libpq/pqcomm.c:837 #3 0x0814c98b in SocketBackend (inBuf=0xbfffec10) at /root/src/postgres/src/backend/tcop/postgres.c:377 (gdb) disassemble Dump of assembler code for function memcpy: 0x0806f0b4 : jmp*0x823efe8 0x0806f0ba : push $0x480 0x0806f0bf : jmp0x806e7a4 <_init+24> End of assembler dump. (gdb) x/i $pc 0x806f0b4 : jmp*0x823efe8 (gdb) x *0x823efe8 0x18304f18 :push %ebp (gdb) disassemble *0x823efe8 Dump of assembler code for function memcpy: 0x18304f18 : push %ebp 0x18304f19 : mov%esp,%ebp 0x18304f1b : mov0x10(%ebp),%eax core #3 (32662) --- (gdb) bt #0 0x0806f3c4 in strncpy () #1 0x081b22fa in set_ps_display (activity=0x4 ) at /root/src/postgres/src/backend/utils/misc/ps_status.c:282 #2 0x0814f3f5 in PostgresMain (argc=4, argv=0x8279838, username=0x8279808 "postgres") at /root/src/postgres/src/backend/tcop/postgres.c:2805 #3 0x0812f24b in BackendFork (port=0x82877a8) at /root/src/postgres/src/backend/postmaster/postmaster.c:2558 (gdb) x/i $pc 0x806f3c4 :jmp*0x823f0ac (gdb) disassemble *0x823f0ac Dump of assembler code for function strncpy: 0x183033c0 : push %ebp 0x183033c1 : mov%esp,%ebp 0x183033c3 : push %edi core #4 (28335) --- (gdb) bt #0 0x0806f0c1 in memcpy () #1 0x08103cee in pq_getbytes (s=0xbfffeb5c "\f", len=4) at /root/src/postgres/src/backend/libpq/pqcomm.c:748 #2 0x08103e04 in pq_getmessage (s=0xbfffec10, maxlen=0) at /root/src/postgres/src/backend/libpq/pqcomm.c:837 #3 0x0814c98b in SocketBackend (inBuf=0xbfffec10) at /root/src/postgres/src/backend/tcop/postgres.c:377 (gdb) x/i $pc 0x806f0c1 : idiv %bh (gdb) disassemble Dump of assembler code for function memcpy: 0x0806f0b4 : jmp*0x823efe8 0x0806f0ba : push $0x480 0x0806f0bf : jmp0x806e7a4 <_init+24> End of assembler dump. (gdb) disassemble *0x823efe8 Dump of assembler code for function memcpy: 0x18304f18 : push %ebp 0x18304f19 : mov%esp,%ebp 0x18304f1b : mov0x10(%ebp),%eax core #5 (22375) (gdb) bt #0 0x0806f32c in SSL_CTX_use_certificate_file () #1 0x08103cee in pq_getbytes (s=0xbfffeb5c "\f", len=4) at /root/src/postgres/src/backend/libpq/pqcomm.c:748 #2 0x08103e04 in pq_getmessage (s=0xbfffec10, maxlen=0) at /root/src/postgres/src/backend/libpq/pqcomm.c:837 #3 0x0814c98b in SocketBackend (inBuf=0xbfffec10) at /root/src/postgres/src/backend/tcop/postgres.c:377 (gdb) x/i $pc 0x806f32c : add $0x70e9,%eax (gdb) disassemble Dump of assembler code for function SSL_CTX_use_certificate_file: 0x0806f324 :jmp*0x823f084 0x0806f32a :push $0x5b8 0x0806f32f : jmp0x806e7a4 <_init+24> End of assembler dump. (gdb) disassemble *0x823f084 Dump of assembler code for function SSL_CTX_use_certificate_file: 0x0806f324 :jmp*0x823f084 0x0806f32a :push $0x5b8 0x0806f32f : jmp0x806e7a4 <_init+24> End of assembler dump. core #6 (22371) --- (gdb) bt #0 0x0806f2e0 in semop () #1 0x1f0812c0 in ?? () #2 0x08149778 in LWLockRelease (lockid=WALWriteLock) at /root/src/postgres/src/back