Re: [HACKERS] signal 11 on AIX: 7.4.2
On Sat, Sep 18, 2004 at 06:06:05AM -0400, Jan Wieck wrote: On 9/17/2004 7:32 PM, Tom Lane wrote: over time. I'm wondering about DNS lookup results in particular. Except for one localhost, one /tmp/.s.PGSQL... and the 543x lookup during the postmaster start, all lookups are IP addresses with AI_NUMERICHOST set. And we have checked with tcpdump that the box really does not issue DNS lookups. Just for the sake of posterity, it appears that this is actually a libc problem on AIX. In particular, there's a patched libc fileset which was released to solve a problem where getaddrinfo() returns an error on valid input. IBM's AIX support was unwilling to give us libraries with debug symbols built in, but they did point me at a new fileset for libc. We've been running a test load which fairly consistently produced sig 11s before, and haven't seen one since. So we don't have a perfect explanation, but it looks like this is the cause. A -- Andrew Sullivan | [EMAIL PROTECTED] The plural of anecdote is not data. --Roger Brinner ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] signal 11 on AIX: 7.4.2
On Fri, Sep 17, 2004 at 07:32:30PM -0400, Tom Lane wrote: involve consulting DNS? If so, try to correlate the crash probability with changes in your DNS zone contents ... No changes. The systems in question have no access to DNS. /etc/hosts only. A -- Andrew Sullivan | [EMAIL PROTECTED] The fact that technology doesn't work is no bar to success in the marketplace. --Philip Greenspun ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] signal 11 on AIX: 7.4.2
On 9/17/2004 7:32 PM, Tom Lane wrote: Jan Wieck [EMAIL PROTECTED] writes: The problem comes and goes. So either I can cause a coredump just on the snap by running a shellscript that does 100 psql -c select version() calls, or it is next to impossible to crash it at all. Hmm, that's really bizarre. It seems like the only satisfactory explanation for that would involve some external condition that varies over time. I'm wondering about DNS lookup results in particular. What values are you asking getaddrinfo to look up, and might those involve consulting DNS? If so, try to correlate the crash probability with changes in your DNS zone contents ... regards, tom lane Except for one localhost, one /tmp/.s.PGSQL... and the 543x lookup during the postmaster start, all lookups are IP addresses with AI_NUMERICHOST set. And we have checked with tcpdump that the box really does not issue DNS lookups. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] signal 11 on AIX: 7.4.2
On 4/19/2004 1:18 PM, Jan Wieck wrote: Tom Lane wrote: Andrew Sullivan [EMAIL PROTECTED] writes: On Thu, Apr 15, 2004 at 07:52:59PM -0400, Tom Lane wrote: I can see from your trace that you are using the getaddrinfo code from libc, but where is configure finding a header that declares struct addrinfo? Hrm, I can't seem to tell. I see this in config.log, but it isn't telling me where it found it. Am I looking in the wrong place? What you'd need to do is determine which system headers are being #include'd by that config test, and then look through them to find struct addrinfo. judging by gdb's structure printing, the crashed postgres instance used the non-43 compatible 64-bit version of the strucure. What I don't really get is that the whole excercise seems to have scribbled over the stack. The hints pointer originating from the on-stack structure in parse_hba is somehow pointing into the blue. This issue is still not closed and it is hitting us more and more. So I would like to add some more of what we have done in the hope to get some more ideas. The scribbled over the stack part turned out to be not true. The stack dump is fine if compiled with -O0. The problem persists in 7.4.5. I have tried to isolate the getaddrinfo() calls by writing a program that does the getaddrinfo() calls done during PM startup, then keeps 100-200 child processes in a fork()/wait() loop and every child process does the same getaddrinfo() calls a starting backend would perform during the pg_hba parsing. This program does not crash. So far we did not get a libc from IBM that has debug symbols. So I only know that getaddrinfo() calls getaddrinfo2(), which calls memmove() and that one crashes with a SIGSEGV. All the call arguments to getaddrinfo() look absolutely fine. I hope to get that libc any time soon to see what exactly that memmove tries to access. The problem comes and goes. So either I can cause a coredump just on the snap by running a shellscript that does 100 psql -c select version() calls, or it is next to impossible to crash it at all. There are numerous reports on the net about getaddrinfo() causing grief on AIX and it seems to be IPV6 related. For the moment we intend to replace the call with a slightly limited implementation using inet_aton() in getaddrinfo_all() whenever AI_NUMERICHOST is set. This will lose us the IPV6 support as hba.c can't parse those pg_hba.conf lines any more. So it is not a satisfactory workaround for PostgreSQL. But I will make that patch available tomorrow night in the event someone else finds it usefull. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] signal 11 on AIX: 7.4.2
Jan Wieck [EMAIL PROTECTED] writes: The problem comes and goes. So either I can cause a coredump just on the snap by running a shellscript that does 100 psql -c select version() calls, or it is next to impossible to crash it at all. Hmm, that's really bizarre. It seems like the only satisfactory explanation for that would involve some external condition that varies over time. I'm wondering about DNS lookup results in particular. What values are you asking getaddrinfo to look up, and might those involve consulting DNS? If so, try to correlate the crash probability with changes in your DNS zone contents ... regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] signal 11 on AIX: 7.4.2
My only guess is that getaddrinfo in your libc has a bug somehow that is corrupting the stack (hance the improper backtrace), then crashing. It could be libc on AIX, I suppose, but it strikes me as sort of odd that nobody else ever seens this. Unless nobody else is using AIX 5.1, which is of course possible. I can confirm, that AIX 4.3.2 getaddrinfo is at least a bit *funny*. getaddrinfo seems to not honour nsorder and only does dns, even though the manual sais: Should there be any discrepancies between this description and the POSIX description, the POSIX description takes precedence. The function does return multiple entries, often the first is not the best. Log is: LOG: could not translate service 5432 to address: Host not found WARNING: could not create listen socket for * LOG: could not bind socket for statistics collector: Can't assign requested address LOG: disabling statistics collector for lack of working socket This area probably needs a fix/workaround on AIX :-( Andreas ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] signal 11 on AIX: 7.4.2
On Thu, Jun 17, 2004 at 06:06:12PM -0400, Bruce Momjian wrote: When you say init directory, what do you mean? /bin? No. The place where the init scripts (which cause postgres to start) live. A -- Andrew Sullivan | [EMAIL PROTECTED] In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] signal 11 on AIX: 7.4.2
Quoth [EMAIL PROTECTED] (Bruce Momjian): Andrew Sullivan wrote: On Thu, Jun 17, 2004 at 01:12:10PM -0400, Bruce Momjian wrote: Well, the bad news is that this backtrace isn't very useful. No kidding. It's pretty frustrating. My only guess is that getaddrinfo in your libc has a bug somehow that is corrupting the stack (hance the improper backtrace), then crashing. It could be libc on AIX, I suppose, but it strikes me as sort of odd that nobody else ever seens this. Unless nobody else is using AIX 5.1, which is of course possible. One hypothesis is that this is happening at start up time (this core dump didn't show up in the data/ area, but in the init directory, however, which makes that theory a little suspect). When you say init directory, what do you mean? /bin? No, it's a directory with various init-like scripts. In premium hosting environments, root access is restricted to the site operators, so PostgreSQL doesn't get started up from /etc/init.d. Instead, PostgreSQL and other services get invoked by custom init scripts in a custom init directory. -- let name=cbbrowne and tld=ntlug.org in name ^ @ ^ tld;; http://www.ntlug.org/~cbbrowne/sap.html I am a bomb technician. If you see me running, try to keep up... ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] signal 11 on AIX: 7.4.2
On Mon, May 10, 2004 at 11:59:40AM -0400, Andrew Sullivan wrote: On the weekend, we ran a set of tests on the offending system to see if we could re-create it. We set up the triggering conditions just as they'd been when it happened, and alas, no segfault. So although this was pretty much regularly reproducible when it actually happened, it's now a note to the Journal of Irreproducible Results. I hate when that happens. I hate it even more when the symptom comes back inexplicably. We had it again. For the record, here's what gdb says (there are some high-bit characters in here; dunno how they'll come though in mail): (gdb) bt #0 0xd01d7778 in memmove () from /usr/lib/libc.a(shr.o) #1 0xd0326e1c in getaddrinfo2 () from /usr/lib/libc.a(shr.o) #2 0xd0327b6c in getaddrinfo () from /usr/lib/libc.a(shr.o) #3 0x10058668 in WriteControlFile () at xlog.c:2121 #4 0x101f8f78 in init_execution_state (src=0x202acd8c , argOidVect=0x7308710b, nargs=4, rettype=539520040, haspolyarg=-104 '\230') at functions.c:121 #5 0x101f9304 in init_sql_fcache (finfo=0xdeadbeef) at functions.c:250 #6 0x101fa57c in set_tz (tz=0x7308710b Address 0x7308710b out of bounds) at variable.c:261 #7 0x101fa9a4 in assign_timezone (value=0x202ad398 , doit=-1 'ÿ', interactive=-8 'ø') at variable.c:584 #8 0x1000466c in PostgresMain (argc=1, argv=0x2002cf38, username=0x1 ) at postgres.c:2560 #9 0x100040b0 in PostgresMain (argc=537240896, argv=0xdeadbeef, username=0xdeadbeef Address 0xdeadbeef out of bounds) at postgres.c:2307 #10 0x10002530 in exec_parse_message (query_string=0x2a24 , stmt_name=0x5 , paramTypes=0x0, numParams=0) at postgres.c:1216 #11 0x10001f84 in exec_simple_query ( query_string=0x2005a540 'ÿ' repeats 40 times) at postgres.c:980 #12 0x15f0 in main (argc=1, argv=0xdeadbeef) at main.c:228 -- Andrew Sullivan | [EMAIL PROTECTED] I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] signal 11 on AIX: 7.4.2
Andrew Sullivan wrote: On Mon, May 10, 2004 at 11:59:40AM -0400, Andrew Sullivan wrote: On the weekend, we ran a set of tests on the offending system to see if we could re-create it. We set up the triggering conditions just as they'd been when it happened, and alas, no segfault. So although this was pretty much regularly reproducible when it actually happened, it's now a note to the Journal of Irreproducible Results. I hate when that happens. I hate it even more when the symptom comes back inexplicably. We had it again. For the record, here's what gdb says (there are some high-bit characters in here; dunno how they'll come though in mail): (gdb) bt #0 0xd01d7778 in memmove () from /usr/lib/libc.a(shr.o) #1 0xd0326e1c in getaddrinfo2 () from /usr/lib/libc.a(shr.o) #2 0xd0327b6c in getaddrinfo () from /usr/lib/libc.a(shr.o) #3 0x10058668 in WriteControlFile () at xlog.c:2121 #4 0x101f8f78 in init_execution_state (src=0x202acd8c , argOidVect=0x7308710b, nargs=4, rettype=539520040, haspolyarg=-104 '\230') at functions.c:121 #5 0x101f9304 in init_sql_fcache (finfo=0xdeadbeef) at functions.c:250 #6 0x101fa57c in set_tz (tz=0x7308710b Address 0x7308710b out of bounds) at variable.c:261 #7 0x101fa9a4 in assign_timezone (value=0x202ad398 , doit=-1 'ÿ', interactive=-8 'ø') at variable.c:584 #8 0x1000466c in PostgresMain (argc=1, argv=0x2002cf38, username=0x1 ) at postgres.c:2560 #9 0x100040b0 in PostgresMain (argc=537240896, argv=0xdeadbeef, username=0xdeadbeef Address 0xdeadbeef out of bounds) at postgres.c:2307 #10 0x10002530 in exec_parse_message (query_string=0x2a24 , stmt_name=0x5 , paramTypes=0x0, numParams=0) at postgres.c:1216 #11 0x10001f84 in exec_simple_query ( query_string=0x2005a540 'ÿ' repeats 40 times) at postgres.c:980 #12 0x15f0 in main (argc=1, argv=0xdeadbeef) at main.c:228 Well, the bad news is that this backtrace isn't very useful. It states the query you sent was 40 0xff's, and it says you called assign_timezone, which called set_tz, which then shows it calling init_sql_fcache() (impossible), which later calls WriteControlFile() impossible, which calls getaddrinfo() (impossible). My only guess is that getaddrinfo in your libc has a bug somehow that is corrupting the stack (hance the improper backtrace), then crashing. As to the cause, I assume this is not reproducable, right? Is there something unusual about your DNS setup or something that might have changed recently that caused getaddrinfo() to do something new? Of course, the memmove() might be causing the problem and the getaddrinfo is a corrupt part of the backtrace too. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] signal 11 on AIX: 7.4.2
On Thu, Jun 17, 2004 at 01:12:10PM -0400, Bruce Momjian wrote: Well, the bad news is that this backtrace isn't very useful. No kidding. It's pretty frustrating. My only guess is that getaddrinfo in your libc has a bug somehow that is corrupting the stack (hance the improper backtrace), then crashing. It could be libc on AIX, I suppose, but it strikes me as sort of odd that nobody else ever seens this. Unless nobody else is using AIX 5.1, which is of course possible. One hypothesis is that this is happening at start up time (this core dump didn't show up in the data/ area, but in the init directory, however, which makes that theory a little suspect). As to the cause, I assume this is not reproducable, right? Is there Well, it's reproduced itsef a few times, but it isn't reproducible at will, and we have no clue what is causing it. something unusual about your DNS setup or something that might have changed recently that caused getaddrinfo() to do something new? Nothing has changed recently, but we started having this not long after promoting an RS/6000 to production on AIX 5.1. Before that we were all-Solaris. We have never managed to tickle this on a test machine. It's pretty tough to guess what might be going on, at least for me. If there are any AIX gurus around, I'd sure like to talk to them. (I do have a budget to pay such gurus, BTW!) Of course, the memmove() might be causing the problem and the getaddrinfo is a corrupt part of the backtrace too. Yeah, which is why it's so frustrating. If I could see what it was doing when it did it, I'd be able to tell. But without knowing why it's happening, there's no way to sit up for 6 weeks while I wait for it to happen. A -- Andrew Sullivan | [EMAIL PROTECTED] This work was visionary and imaginative, and goes to show that visionary and imaginative work need not end up well. --Dennis Ritchie ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] signal 11 on AIX: 7.4.2
Andrew Sullivan wrote: On Thu, Jun 17, 2004 at 01:12:10PM -0400, Bruce Momjian wrote: Well, the bad news is that this backtrace isn't very useful. No kidding. It's pretty frustrating. My only guess is that getaddrinfo in your libc has a bug somehow that is corrupting the stack (hance the improper backtrace), then crashing. It could be libc on AIX, I suppose, but it strikes me as sort of odd that nobody else ever seens this. Unless nobody else is using AIX 5.1, which is of course possible. One hypothesis is that this is happening at start up time (this core dump didn't show up in the data/ area, but in the init directory, however, which makes that theory a little suspect). When you say init directory, what do you mean? /bin? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] signal 11 on AIX: 7.4.2
On Wed, Apr 28, 2004 at 03:56:55PM -0400, Andrew Sullivan wrote: On Mon, Apr 26, 2004 at 03:19:21PM -0400, Bruce Momjian wrote: Has this been resolved? it elsewhere. I've been trying some alternative approaches to causing it today, and so far no luck. On the weekend, we ran a set of tests on the offending system to see if we could re-create it. We set up the triggering conditions just as they'd been when it happened, and alas, no segfault. So although this was pretty much regularly reproducible when it actually happened, it's now a note to the Journal of Irreproducible Results. I hate when that happens. A -- Andrew Sullivan | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] signal 11 on AIX: 7.4.2
On Mon, Apr 26, 2004 at 03:19:21PM -0400, Bruce Momjian wrote: Has this been resolved? Not as far as I know. Unfortunately, the problem happened in an environment I Can't Play With, and I haven't been able to reproduce it elsewhere. I've been trying some alternative approaches to causing it today, and so far no luck. Jan is, AFAIK, similarly mystified about what happened. A -- Andrew Sullivan | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] signal 11 on AIX: 7.4.2
Has this been resolved? --- Andrew Sullivan wrote: On Mon, Apr 19, 2004 at 11:18:07AM -0400, Tom Lane wrote: What you'd need to do is determine which system headers are being #include'd by that config test, and then look through them to find struct addrinfo. Well, I have this in /usr/include/netdb.h: struct addrinfo { int ai_flags; /* AI_PASSIVE, AI_CANONNAME, AI_NUMERICH OST */ int ai_family; /* PF_xxx */ int ai_socktype; /* SOCK_xxx */ int ai_protocol; /* 0 or IPPROTO_xxx */ size_t ai_addrlen;/* length of ai_addr */ char*ai_canonname; /* canonical name for hostname */ struct sockaddr *ai_addr; /* binary address */ struct addrinfo *ai_next; /* next structure in list */ }; Using the cpp trick that Alvaro Herrera suggested, I see that file mentioned in the output, and this a little way along: struct addrinfo { int ai_flags; int ai_family; int ai_socktype; int ai_protocol; size_t ai_addrlen; char*ai_canonname; struct sockaddr *ai_addr; struct addrinfo *ai_next; }; So it looks like that must be the one. Dunno if this helps. A -- Andrew Sullivan | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] signal 11 on AIX: 7.4.2
Andrew Sullivan [EMAIL PROTECTED] writes: On Thu, Apr 15, 2004 at 07:52:59PM -0400, Tom Lane wrote: I can see from your trace that you are using the getaddrinfo code from libc, but where is configure finding a header that declares struct addrinfo? Hrm, I can't seem to tell. I see this in config.log, but it isn't telling me where it found it. Am I looking in the wrong place? What you'd need to do is determine which system headers are being #include'd by that config test, and then look through them to find struct addrinfo. A shortcut is just to grep through /usr/include and its subdirectories for addrinfo. If you only find one definition, then you don't really need to worry too much. But if there's more than one you need to determine which is getting used. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] signal 11 on AIX: 7.4.2
On Mon, Apr 19, 2004 at 11:18:07AM -0400, Tom Lane wrote: A shortcut is just to grep through /usr/include and its subdirectories for addrinfo. If you only find one definition, then you don't really need to worry too much. But if there's more than one you need to determine which is getting used. Maybe an easier way is to examine the output of cpp src/include/c.h. -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) En las profundidades de nuestro inconsciente hay una obsesiva necesidad de un universo lógico y coherente. Pero el universo real se halla siempre un paso más allá de la lógica (Irulan) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] signal 11 on AIX: 7.4.2
Tom Lane wrote: Andrew Sullivan [EMAIL PROTECTED] writes: On Thu, Apr 15, 2004 at 07:52:59PM -0400, Tom Lane wrote: I can see from your trace that you are using the getaddrinfo code from libc, but where is configure finding a header that declares struct addrinfo? Hrm, I can't seem to tell. I see this in config.log, but it isn't telling me where it found it. Am I looking in the wrong place? What you'd need to do is determine which system headers are being #include'd by that config test, and then look through them to find struct addrinfo. judging by gdb's structure printing, the crashed postgres instance used the non-43 compatible 64-bit version of the strucure. What I don't really get is that the whole excercise seems to have scribbled over the stack. The hints pointer originating from the on-stack structure in parse_hba is somehow pointing into the blue. Jan A shortcut is just to grep through /usr/include and its subdirectories for addrinfo. If you only find one definition, then you don't really need to worry too much. But if there's more than one you need to determine which is getting used. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] signal 11 on AIX: 7.4.2
On Mon, Apr 19, 2004 at 11:18:07AM -0400, Tom Lane wrote: What you'd need to do is determine which system headers are being #include'd by that config test, and then look through them to find struct addrinfo. Well, I have this in /usr/include/netdb.h: struct addrinfo { int ai_flags; /* AI_PASSIVE, AI_CANONNAME, AI_NUMERICH OST */ int ai_family; /* PF_xxx */ int ai_socktype; /* SOCK_xxx */ int ai_protocol; /* 0 or IPPROTO_xxx */ size_t ai_addrlen;/* length of ai_addr */ char*ai_canonname; /* canonical name for hostname */ struct sockaddr *ai_addr; /* binary address */ struct addrinfo *ai_next; /* next structure in list */ }; Using the cpp trick that Alvaro Herrera suggested, I see that file mentioned in the output, and this a little way along: struct addrinfo { int ai_flags; int ai_family; int ai_socktype; int ai_protocol; size_t ai_addrlen; char*ai_canonname; struct sockaddr *ai_addr; struct addrinfo *ai_next; }; So it looks like that must be the one. Dunno if this helps. A -- Andrew Sullivan | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] signal 11 on AIX: 7.4.2
We've had a backend crash with sig 11 during connection. My guess is there's something up with (maybe) the IPv6 support on AIX. I seem to recall something similar recently, but I can't find the post in the archives. Suggestions? oxrslive=# SELECT version(); version -- PostgreSQL 7.4.2 on powerpc-ibm-aix5.1.0.0, compiled by GCC 2.9-aix51-020209 (1 row) GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as powerpc-ibm-aix5.1.0.0... Core was generated by `postgres'. Program terminated with signal 11, Segmentation fault. #0 0xd01d7778 in memmove () from /usr/lib/libc.a(shr.o) (gdb) bt #0 0xd01d7778 in memmove () from /usr/lib/libc.a(shr.o) #1 0xd0326e1c in getaddrinfo2 () from /usr/lib/libc.a(shr.o) #2 0xd0327b6c in getaddrinfo () from /usr/lib/libc.a(shr.o) #3 0x1005860c in getaddrinfo_all (hostname=0x34e0 , servname=0x74696f Address 0x74696f out of bounds, hintp=0xf03a2e80, result=0x74696f) at ip.c:78 #4 0x101f9330 in parse_hba (line=0x202ae198, port=0x202a6988, found_p=0x2ff1f810 , error_p=0x2ff1f811 ) at hba.c:669 #5 0x101f96bc in check_hba (port=0x202a6988) at hba.c:793 #6 0x101fa934 in hba_getauthmethod (port=0x202b6f3c) at hba.c:1574 #7 0x101fad5c in ClientAuthentication (port=0x202a6988) at auth.c:415 #8 0x10004674 in BackendFork (port=0x202a6988) at postmaster.c:2444 #9 0x100040b8 in BackendStartup (port=0x202a6988) at postmaster.c:2207 #10 0x10002538 in ServerLoop () at postmaster.c:1119 #11 0x10001f8c in PostmasterMain (argc=1, argv=0x20270698) at postmaster.c:897 #12 0x15f0 in main (argc=1, argv=0x2ff22b8c) at main.c:214 (gdb) A -- Andrew Sullivan | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] signal 11 on AIX: 7.4.2
On Thu, Apr 15, 2004 at 01:07:33PM -0400, Andrew Sullivan wrote: We've had a backend crash with sig 11 during connection. By the way, I failed to mention, but sig 11 is segfault on AIX. A -- Andrew Sullivan | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] signal 11 on AIX: 7.4.2
Andrew Sullivan [EMAIL PROTECTED] writes: We've had a backend crash with sig 11 during connection. My guess is there's something up with (maybe) the IPv6 support on AIX. (gdb) bt #0 0xd01d7778 in memmove () from /usr/lib/libc.a(shr.o) #1 0xd0326e1c in getaddrinfo2 () from /usr/lib/libc.a(shr.o) #2 0xd0327b6c in getaddrinfo () from /usr/lib/libc.a(shr.o) #3 0x1005860c in getaddrinfo_all (hostname=0x34e0 , servname=0x74696f Address 0x74696f out of bounds, hintp=0xf03a2e80, result=0x74696f) at ip.c:78 #4 0x101f9330 in parse_hba (line=0x202ae198, port=0x202a6988, found_p=0x2ff1f810 , error_p=0x2ff1f811 ) at hba.c:669 Hm, a crash inside the system-supplied getaddrinfo routine would suggest that there's something wrong with the values we are passing into it. The most likely bet is that we don't agree with libc about the layout of struct addrinfo. The configure script goes out of its way to be paranoid about this, because we've seen it get confused by add-on libbind installations (see also the head comment in src/include/getaddrinfo.h) ... but I'll bet that AIX has found another way to trip it up. I can see from your trace that you are using the getaddrinfo code from libc, but where is configure finding a header that declares struct addrinfo? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster