Re: [HACKERS] Segfault related to pg_authid when running initdb from git master
On 15 December 2010 01:35, Robert Haas robertmh...@gmail.com wrote: I am suspicious of the fact that you are invoking initdb as ./initdb. Is it possible you're invoking this from the build tree, and there's an installed copy out there that doesn't match, but is getting used? Like maybe in /usr/local/pgsql/bin? No, I'm not doing that. I'm running initdb from /usr/local/pgsql/bin (nothing pg related can be found in my $PATH), but it's the only copy on my system, which was installed from git master last night. It has debugging symbols, and I've actually re-created this from initdb's point of view within GDB with source level debugging. Can you fire up gdb on this core dump, using gdb /usr/local/pgsql/bin/postgres /path/to/coredump? Or, another possibility is to run initdb with --noclean and then run the command, without routing the output to /dev/null: /usr/local/pgsql/bin/postgres --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 I cannot find the coredump. Perhaps it's a permissions issue. What do you think? Anyway, I have produced a useful backtrace by debugging postgres directly after running initdb with --noclean as described: [pe...@peter bin]$ /usr/local/pgsql/bin/postgres --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 Segmentation fault [pe...@peter bin]$ gdb postgres GNU gdb (GDB) Fedora (7.2-26.fc14) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /usr/local/pgsql/bin/postgres...done. (gdb) set args --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 (gdb) start Temporary breakpoint 1 at 0x577360 Starting program: /usr/local/pgsql/bin/postgres --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 Temporary breakpoint 1, 0x00577360 in main () (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. 0x0047615b in _bt_preprocess_keys () (gdb) bt #0 0x0047615b in _bt_preprocess_keys () #1 0x00475382 in _bt_first () #2 0x00473d71 in btgettuple () #3 0x006ba67c in FunctionCall2 () #4 0x0046e08a in index_getnext () #5 0x0046d556 in systable_getnext () #6 0x006a92bf in LookupOpclassInfo () #7 0x006a9a58 in RelationInitIndexAccessInfo () #8 0x006aa9cb in RelationBuildDesc () #9 0x006aabfd in load_critical_index () #10 0x006ac12a in RelationCacheInitializePhase3 () #11 0x006c19ca in InitPostgres () #12 0x0060058f in PostgresMain () #13 0x0057774d in main () For some reason, postgres has limited debugging symbols (no line number information is available). Given that it is available from initdb, that seems very odd: Temporary breakpoint 1 at 0x577360 Starting program: /usr/local/pgsql/bin/postgres --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 Temporary breakpoint 1, 0x00577360 in main () (gdb) n Single stepping until exit from function main, which has no line number information. Program received signal SIGSEGV, Segmentation fault. 0x0047615b in _bt_preprocess_keys () Hope that helps. -- Regards, Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segfault related to pg_authid when running initdb from git master
On Wed, Dec 15, 2010 at 6:07 AM, Peter Geoghegan peter.geoghega...@gmail.com wrote: On 15 December 2010 01:35, Robert Haas robertmh...@gmail.com wrote: I am suspicious of the fact that you are invoking initdb as ./initdb. Is it possible you're invoking this from the build tree, and there's an installed copy out there that doesn't match, but is getting used? Like maybe in /usr/local/pgsql/bin? No, I'm not doing that. I'm running initdb from /usr/local/pgsql/bin (nothing pg related can be found in my $PATH), but it's the only copy on my system, which was installed from git master last night. It has debugging symbols, and I've actually re-created this from initdb's point of view within GDB with source level debugging. Well, something's clearly funky here because your initdb has debugging symbols but your postgres executable does not. I may be missing something obvious, but I don't see how that can happen without mixing up two different builds. Can you fire up gdb on this core dump, using gdb /usr/local/pgsql/bin/postgres /path/to/coredump? Or, another possibility is to run initdb with --noclean and then run the command, without routing the output to /dev/null: /usr/local/pgsql/bin/postgres --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 I cannot find the coredump. Perhaps it's a permissions issue. What do you think? It would presumably get dumped into the data directory. So if --noclean isn't used I expect it'll get nuked. Anyway, I have produced a useful backtrace by debugging postgres directly after running initdb with --noclean as described: [pe...@peter bin]$ /usr/local/pgsql/bin/postgres --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 Segmentation fault [pe...@peter bin]$ gdb postgres GNU gdb (GDB) Fedora (7.2-26.fc14) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /usr/local/pgsql/bin/postgres...done. (gdb) set args --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 (gdb) start Temporary breakpoint 1 at 0x577360 Starting program: /usr/local/pgsql/bin/postgres --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 Temporary breakpoint 1, 0x00577360 in main () (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. 0x0047615b in _bt_preprocess_keys () (gdb) bt #0 0x0047615b in _bt_preprocess_keys () #1 0x00475382 in _bt_first () #2 0x00473d71 in btgettuple () #3 0x006ba67c in FunctionCall2 () #4 0x0046e08a in index_getnext () #5 0x0046d556 in systable_getnext () #6 0x006a92bf in LookupOpclassInfo () #7 0x006a9a58 in RelationInitIndexAccessInfo () #8 0x006aa9cb in RelationBuildDesc () #9 0x006aabfd in load_critical_index () #10 0x006ac12a in RelationCacheInitializePhase3 () #11 0x006c19ca in InitPostgres () #12 0x0060058f in PostgresMain () #13 0x0057774d in main () Ugh. Maybe someone smarter can figure out what that means, but I have no clue. _bt_preprocess_keys() is a pretty good-sized function; there's no obvious way to know which pointer reference is blowing up without line-number information. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segfault related to pg_authid when running initdb from git master
On 15 December 2010 16:26, Robert Haas robertmh...@gmail.com wrote: On Wed, Dec 15, 2010 at 6:07 AM, Peter Geoghegan peter.geoghega...@gmail.com wrote: On 15 December 2010 01:35, Robert Haas robertmh...@gmail.com wrote: I am suspicious of the fact that you are invoking initdb as ./initdb. Is it possible you're invoking this from the build tree, and there's an installed copy out there that doesn't match, but is getting used? Like maybe in /usr/local/pgsql/bin? No, I'm not doing that. I'm running initdb from /usr/local/pgsql/bin (nothing pg related can be found in my $PATH), but it's the only copy on my system, which was installed from git master last night. It has debugging symbols, and I've actually re-created this from initdb's point of view within GDB with source level debugging. Well, something's clearly funky here because your initdb has debugging symbols but your postgres executable does not. I may be missing something obvious, but I don't see how that can happen without mixing up two different builds. Just to make sure that I'm not going crazy, I did a git pull, rebuilt pg passing --enable-debug and --enable-casssert to configure as before, followed by make make install. Then I tried this: [pe...@peter bin]$ pwd /usr/local/pgsql/bin [pe...@peter bin]$ ls -l total 7720 -rwxr-xr-x. 1 root root 53977 Dec 15 16:47 clusterdb -rwxr-xr-x. 1 root root 55058 Dec 15 16:47 createdb -rwxr-xr-x. 1 root root 58351 Dec 15 16:47 createlang -rwxr-xr-x. 1 root root 58036 Dec 15 16:47 createuser -rwxr-xr-x. 1 root root 53380 Dec 15 16:47 dropdb -rwxr-xr-x. 1 root root 62052 Dec 15 16:47 droplang -rwxr-xr-x. 1 root root 53382 Dec 15 16:47 dropuser -rwxr-xr-x. 1 root root 707190 Dec 15 16:47 ecpg -rwxr-xr-x. 1 root root 123447 Dec 15 16:47 initdb -rwxr-xr-x. 1 root root 26435 Dec 15 16:47 pg_config -rwxr-xr-x. 1 root root 25229 Dec 15 16:47 pg_controldata -rwxr-xr-x. 1 root root 73784 Dec 15 16:47 pg_ctl -rwxr-xr-x. 1 root root 301781 Dec 15 16:47 pg_dump -rwxr-xr-x. 1 root root 75323 Dec 15 16:47 pg_dumpall -rwxr-xr-x. 1 root root 32015 Dec 15 16:47 pg_resetxlog -rwxr-xr-x. 1 root root 131867 Dec 15 16:47 pg_restore -rwxr-xr-x. 1 root root 91006 Dec 6 11:34 pg_upgrade -rwxr-xr-x. 1 root root 5380671 Dec 15 16:47 postgres lrwxrwxrwx. 1 root root 8 Dec 15 16:47 postmaster - postgres -rwxr-xr-x. 1 root root 398677 Dec 15 16:47 psql -rwxr-xr-x. 1 root root 55257 Dec 15 16:47 reindexdb -rwxr-xr-x. 1 root root 32410 Dec 15 16:47 vacuumdb [pe...@peter bin]$ which postgres /usr/bin/which: no postgres in (/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/peter/bin) [pe...@peter bin]$ which initdb /usr/bin/which: no initdb in (/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/peter/bin) Observe that the initdb and postgres timestamps are the same. This laptop is less than 2 weeks old, and has never had any postgres packages installed on it. I can once again reproduce the problem, exactly as before. My postgres executable does have debugging symbols, just less than initdb (I'm not sure what the exact term is, but it just lacks line information while having some debugging symbols). I cannot find the coredump. Perhaps it's a permissions issue. What do you think? It would presumably get dumped into the data directory. So if --noclean isn't used I expect it'll get nuked. It isn't there...it just looks like a virginal PGDATA directory. Ugh. Maybe someone smarter can figure out what that means, but I have no clue. _bt_preprocess_keys() is a pretty good-sized function; there's no obvious way to know which pointer reference is blowing up without line-number information. That's a pity, because I don't have a clue how to get line number information. I could always try printf() debugging, but I really shouldn't have to. -- Regards, Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segfault related to pg_authid when running initdb from git master
On Wed, Dec 15, 2010 at 12:39 PM, Peter Geoghegan peter.geoghega...@gmail.com wrote: Observe that the initdb and postgres timestamps are the same. Hrm. I cannot find the coredump. Perhaps it's a permissions issue. What do you think? It would presumably get dumped into the data directory. So if --noclean isn't used I expect it'll get nuked. It isn't there...it just looks like a virginal PGDATA directory. Double hrm. I have no idea how you can be getting line number information for initdb but not postgres. I think what you're getting from postgres is normally what I'd expect to see without --enable-debug. It sounds like you are doing it right, but I have no explanation for the results. What distro are you using? This can't be broken across the board, given the lack of metoos. Can you use git bisect to figure out which commit broke it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segfault related to pg_authid when running initdb from git master
Robert Haas robertmh...@gmail.com writes: What distro are you using? This can't be broken across the board, given the lack of metoos. Can you use git bisect to figure out which commit broke it? Before that, have you tried the old standby of make distclean and a full rebuild/reinstall? The lack of buildfarm confirmation makes me highly suspicious that there's any real problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segfault related to pg_authid when running initdb from git master
Before that, have you tried the old standby of make distclean and a full rebuild/reinstall? The lack of buildfarm confirmation makes me highly suspicious that there's any real problem. That's fixed both problems. I should have tried it much sooner. I guess that even though the binaries built were new, they were somehow linked with one or more older, release object files. Thanks. -- Regards, Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segfault related to pg_authid when running initdb from git master
On Wed, Dec 15, 2010 at 2:40 PM, Peter Geoghegan peter.geoghega...@gmail.com wrote: Before that, have you tried the old standby of make distclean and a full rebuild/reinstall? The lack of buildfarm confirmation makes me highly suspicious that there's any real problem. That's fixed both problems. I should have tried it much sooner. I guess that even though the binaries built were new, they were somehow linked with one or more older, release object files. Thanks. Gah. I assumed you had cleaned out your tree. Oh, well. If you don't use --enable-depend, you can get this kind of issue. Even if you do, it's worth trying a full clean out (I use git clean -dfx) if you get something weird. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segfault related to pg_authid when running initdb from git master
Excerpts from Peter Geoghegan's message of mié dic 15 16:40:41 -0300 2010: Before that, have you tried the old standby of make distclean and a full rebuild/reinstall? The lack of buildfarm confirmation makes me highly suspicious that there's any real problem. That's fixed both problems. I should have tried it much sooner. I guess that even though the binaries built were new, they were somehow linked with one or more older, release object files. Thanks. This is probably caused by failure to use the --enable-depend configure switch. I think we should try to make that the default on platforms that support it. It seems silly not to use it. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segfault related to pg_authid when running initdb from git master
On 15 December 2010 19:43, Robert Haas robertmh...@gmail.com wrote: Gah. I assumed you had cleaned out your tree. Oh, well. If you don't use --enable-depend, you can get this kind of issue. Even if you do, it's worth trying a full clean out (I use git clean -dfx) if you get something weird. Thanks for the tip. I guess it simply didn't occur to me to make distclean because I made the rather questionable assumption that it's only necessary when there are weird linking issues. -- Regards, Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Segfault related to pg_authid when running initdb from git master
Here's the output I see when $SUBJECT occurs, on a pg freshly built from git master with --enable-debug and --enable-cassert: [postg...@peter bin]$ uname -a Linux peter.laptop 2.6.35.9-64.fc14.x86_64 #1 SMP Fri Dec 3 12:19:41 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux [postg...@peter bin]$ ./initdb -D /var/lib/pgsql/data The files belonging to this database system will be owned by user postgres. This user must also own the server process. The database cluster will be initialized with locale en_IE.utf8. The default database encoding has accordingly been set to UTF8. The default text search configuration will be set to english. fixing permissions on existing directory /var/lib/pgsql/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 24MB creating configuration files ... ok creating template1 database in /var/lib/pgsql/data/base/1 ... ok initializing pg_authid ... sh: line 1: 23515 Segmentation fault (core dumped) /usr/local/pgsql/bin/postgres --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 /dev/null child process exited with exit code 139 initdb: removing contents of data directory /var/lib/pgsql/data I'm having difficulty producing a useful backtrace, because the segfault seemingly doesn't actually occur within initdb - it occurs within a postgres process. If someone could tell me the trick to attaching to that process under these circumstances, I could look into it further. The trouble seems occur here, at line 1224 of initdb.c: for (line = pg_authid_setup; *line != NULL; line++) PG_CMD_PUTS(*line); After I see the segmentation fault in stderr, gdb reports that initdb has received SIGPIPE. Hope that helps. -- Regards, Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segfault related to pg_authid when running initdb from git master
On Tue, Dec 14, 2010 at 5:21 PM, Peter Geoghegan peter.geoghega...@gmail.com wrote: Here's the output I see when $SUBJECT occurs, on a pg freshly built from git master with --enable-debug and --enable-cassert: I am suspicious of the fact that you are invoking initdb as ./initdb. Is it possible you're invoking this from the build tree, and there's an installed copy out there that doesn't match, but is getting used? Like maybe in /usr/local/pgsql/bin? creating template1 database in /var/lib/pgsql/data/base/1 ... ok initializing pg_authid ... sh: line 1: 23515 Segmentation fault (core dumped) /usr/local/pgsql/bin/postgres --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 /dev/null child process exited with exit code 139 Can you fire up gdb on this core dump, using gdb /usr/local/pgsql/bin/postgres /path/to/coredump? Or, another possibility is to run initdb with --noclean and then run the command, without routing the output to /dev/null: /usr/local/pgsql/bin/postgres --single -F -O -c search_path=pg_catalog -c exit_on_error=true template1 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers