Re: [HACKERS] Postgres server goes in recovery mode repeteadly
daveg writes: > We have had this deployed in our test and production environments for a > couple weeks now. We have not seen any further instance of the problem. > Without the patch, we would have expected to see at least a few by now. > So the patch appears to be effective. Cool, thanks for the follow-up. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres server goes in recovery mode repeteadly
On Fri, Oct 02, 2009 at 07:57:13PM -0700, daveg wrote: > On Fri, Oct 02, 2009 at 10:41:07AM -0400, Alvaro Herrera wrote: > > daveg escribió: > > > > > I work with Kunal and have been looking into this. It appears to be the > > > same > > > as the bug described in: > > > > > > http://archives.postgresql.org/pgsql-bugs/2009-09/msg00355.php > > > > > > as I have localized it to a NULL pointer deference in > > > RelationCacheInitializePhase2() as well. Tom speculates in: > > > > > > http://archives.postgresql.org/pgsql-bugs/2009-09/msg00372.php > > > > > > that large numbers of table drops might trigger this. The system in > > > question > > > creates and drops temp tables at a high rate which tends to confirm this. > > > > Did you test the patch posted by Tom? > > We are testing it since last night in our test environment. If it does not > break anything (unlikely) we will deploy it next week. However, since the > problem is only occasional, only happens every few days on one of 50+ hosts, > it will take some extended time without further segfaults to say anything > confident about the patches effectiveness. We have had this deployed in our test and production environments for a couple weeks now. We have not seen any further instance of the problem. Without the patch, we would have expected to see at least a few by now. So the patch appears to be effective. -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres server goes in recovery mode repeteadly
On Fri, Oct 02, 2009 at 10:41:07AM -0400, Alvaro Herrera wrote: > daveg escribió: > > > I work with Kunal and have been looking into this. It appears to be the same > > as the bug described in: > > > > http://archives.postgresql.org/pgsql-bugs/2009-09/msg00355.php > > > > as I have localized it to a NULL pointer deference in > > RelationCacheInitializePhase2() as well. Tom speculates in: > > > > http://archives.postgresql.org/pgsql-bugs/2009-09/msg00372.php > > > > that large numbers of table drops might trigger this. The system in question > > creates and drops temp tables at a high rate which tends to confirm this. > > Did you test the patch posted by Tom? We are testing it since last night in our test environment. If it does not break anything (unlikely) we will deploy it next week. However, since the problem is only occasional, only happens every few days on one of 50+ hosts, it will take some extended time without further segfaults to say anything confident about the patches effectiveness. -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres server goes in recovery mode repeteadly
daveg escribió: > I work with Kunal and have been looking into this. It appears to be the same > as the bug described in: > > http://archives.postgresql.org/pgsql-bugs/2009-09/msg00355.php > > as I have localized it to a NULL pointer deference in > RelationCacheInitializePhase2() as well. Tom speculates in: > > http://archives.postgresql.org/pgsql-bugs/2009-09/msg00372.php > > that large numbers of table drops might trigger this. The system in question > creates and drops temp tables at a high rate which tends to confirm this. Did you test the patch posted by Tom? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres server goes in recovery mode repeteadly
On Tue, Sep 29, 2009 at 09:52:06PM +0530, kunal sharma wrote: > Hi , > We are using Postgres 8.4 and its been found going into recovery > mode couple of times. The server process seems to fork another child process > which is another postgres server running under same data directory and after > some time it goes away while the old server is still running. There were few > load issues on the server but the load didnt went above "32". > >We are running opensuse 10.2 x86_64 with 32Gb of physical memory. > Checking the logs I found that theres a segmentation fault , > > > Sep 26 05:39:54 pace kernel: postgres[28694]: segfault at 0030 > rip 0066ba8c rsp 7fffd364da30 error 4 > > gdb dump shows this > > Reading symbols from /lib64/libdl.so.2...done. > Loaded symbols for /lib64/libdl.so.2 > Reading symbols from /lib64/libm.so.6...done. > Loaded symbols for /lib64/libm.so.6 > Reading symbols from /lib64/libc.so.6...done. > Loaded symbols for /lib64/libc.so.6 > Reading symbols from /lib64/ld-linux-x86-64.so.2...done. > Loaded symbols for /lib64/ld-linux-x86-64.so.2 > Reading symbols from /lib64/libnss_files.so.2...done. > Loaded symbols for /lib64/libnss_files.so.2 > 0x2ad6d7b8c2b3 in __select_nocancel () from /lib64/libc.so.6 > (gdb) > > Any suggestions what is causing this segmentation fault? I work with Kunal and have been looking into this. It appears to be the same as the bug described in: http://archives.postgresql.org/pgsql-bugs/2009-09/msg00355.php as I have localized it to a NULL pointer deference in RelationCacheInitializePhase2() as well. Tom speculates in: http://archives.postgresql.org/pgsql-bugs/2009-09/msg00372.php that large numbers of table drops might trigger this. The system in question creates and drops temp tables at a high rate which tends to confirm this. -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [ADMIN] [HACKERS] Postgres server goes in recovery mode repeteadly
kunal sharma writes: > gdb backtrce- > (gdb) bt full > #0 0x2ad6d7b8c2b3 in __select_nocancel () from /lib64/libc.so.6 > No symbol table info available. > #1 0x005a39bc in ServerLoop () at postmaster.c:1304 > timeout = {tv_sec = 55, tv_usec = 352000} I think what you're showing us is a stack trace of an idle postmaster process, not the process that crashed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres server goes in recovery mode repeteadly
gdb backtrce- (gdb) bt full #0 0x2ad6d7b8c2b3 in __select_nocancel () from /lib64/libc.so.6 No symbol table info available. #1 0x005a39bc in ServerLoop () at postmaster.c:1304 timeout = {tv_sec = 55, tv_usec = 352000} rmask = {fds_bits = {24, 0 }} selres = readmask = {fds_bits = {24, 0 }} nSockets = 5 now = 1254241068 last_touch_time = 1254238950 __func__ = "ServerLoop" #2 0x005a4dba in PostmasterMain (argc=3, argv=0xb1e3d0) at postmaster.c:1040 fpidfile = (FILE *) 0x3 opt = status = userDoption = 0x1 __func__ = "PostmasterMain" #3 0x00553b5e in main (argc=3, argv=0xb1e3d0) at main.c:188 No locals. (gdb) 2009/9/29 Andrew Dunstan > > > kunal sharma wrote: > >> Hi , >>We are using Postgres 8.4 and its been found going into recovery >> mode couple of times. The server process seems to fork another child process >> which is another postgres server running under same data directory and after >> some time it goes away while the old server is still running. There were few >> load issues on the server but the load didnt went above "32". >> >> We are running opensuse 10.2 x86_64 with 32Gb of physical memory. >> Checking the logs I found that theres a segmentation fault , >> >> Sep 26 05:39:54 pace kernel: postgres[28694]: segfault at 0030 >> rip 0066ba8c rsp 7fffd364da30 error 4 >> >> gdb dump shows this >> >> Reading symbols from /lib64/libdl.so.2...done. >> Loaded symbols for /lib64/libdl.so.2 >> Reading symbols from /lib64/libm.so.6...done. >> Loaded symbols for /lib64/libm.so.6 >> Reading symbols from /lib64/libc.so.6...done. >> Loaded symbols for /lib64/libc.so.6 >> Reading symbols from /lib64/ld-linux-x86-64.so.2...done. >> Loaded symbols for /lib64/ld-linux-x86-64.so.2 >> Reading symbols from /lib64/libnss_files.so.2...done. >> Loaded symbols for /lib64/libnss_files.so.2 >> 0x2ad6d7b8c2b3 in __select_nocancel () from /lib64/libc.so.6 >> (gdb) >> >> >> >> > > Please try to get a backtrace from gdb. > > cheers > > andrew >
Re: [HACKERS] Postgres server goes in recovery mode repeteadly
kunal sharma wrote: Hi , We are using Postgres 8.4 and its been found going into recovery mode couple of times. The server process seems to fork another child process which is another postgres server running under same data directory and after some time it goes away while the old server is still running. There were few load issues on the server but the load didnt went above "32". We are running opensuse 10.2 x86_64 with 32Gb of physical memory. Checking the logs I found that theres a segmentation fault , Sep 26 05:39:54 pace kernel: postgres[28694]: segfault at 0030 rip 0066ba8c rsp 7fffd364da30 error 4 gdb dump shows this Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libm.so.6...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libnss_files.so.2...done. Loaded symbols for /lib64/libnss_files.so.2 0x2ad6d7b8c2b3 in __select_nocancel () from /lib64/libc.so.6 (gdb) Please try to get a backtrace from gdb. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Postgres server goes in recovery mode repeteadly
Hi , We are using Postgres 8.4 and its been found going into recovery mode couple of times. The server process seems to fork another child process which is another postgres server running under same data directory and after some time it goes away while the old server is still running. There were few load issues on the server but the load didnt went above "32". We are running opensuse 10.2 x86_64 with 32Gb of physical memory. Checking the logs I found that theres a segmentation fault , Sep 26 05:39:54 pace kernel: postgres[28694]: segfault at 0030 rip 0066ba8c rsp 7fffd364da30 error 4 gdb dump shows this Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libm.so.6...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libnss_files.so.2...done. Loaded symbols for /lib64/libnss_files.so.2 0x2ad6d7b8c2b3 in __select_nocancel () from /lib64/libc.so.6 (gdb) Any suggestions what is causing this segmentation fault?