Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-17, Ted Unangst t...@tedunangst.com wrote: -memcpy(addrcopy, addr, sizeof(addrcopy)); -memcpy(maskcopy, mask, sizeof(maskcopy)); +memcpy(addrcopy, addr, addr-sa_len); +memcpy(maskcopy, mask, mask-sa_len); How did this ever work? It didn't.
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-16 Mon 18:19 PM |, Hugo Osvaldo Barrera wrote: #3 0x11080cf8d1b1 in check_ip (raddr=0x110abc279918, addr=0x110a899f9058, mask=0x110a899f9158) at hba.c:704 Is this an IPv6 thing? Until recently, Squid crashes likewise: Squid bug: 4024 Status: RESOLVED FIXED log a warning but not abort if ::1 and *only* ::1 has failed to resolve. http://marc.info/?l=openbsd-portsm=141339262226378w=2
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-15, Hugo Osvaldo Barrera h...@barrera.io wrote: Am I mistaken in understanding that this is an issue with postgresql itself, and not a local configuration error? Correct. I tried building postgres with debug symbols (I added the flags described here[1] to the ports Makefile), but the backtrace is still useless: Please would you rebuild from the original port like this: make clean=all make DEBUG=-O0 -g repackage sudo make reinstall and see if this gives a better backtrace.
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-16 16:24, Stuart Henderson wrote: On 2015-02-15, Hugo Osvaldo Barrera h...@barrera.io wrote: Am I mistaken in understanding that this is an issue with postgresql itself, and not a local configuration error? Correct. I tried building postgres with debug symbols (I added the flags described here[1] to the ports Makefile), but the backtrace is still useless: Please would you rebuild from the original port like this: make clean=all make DEBUG=-O0 -g repackage sudo make reinstall and see if this gives a better backtrace. Thanks a lot, it did. I was unaware of make DEBUG, and had been editing the Makefile with no success. (gdb) bt #0 0x110a2815b92a in kill () at stdin:2 #1 0x110a28195119 in abort () at /usr/src/lib/libc/stdlib/abort.c:53 #2 0x110a2816a238 in memcpy (dst0=0xfb8d4, src0=0x6, length=0) at /usr/src/lib/libc/string/memcpy.c:65 #3 0x11080cf8d1b1 in check_ip (raddr=0x110a899f7918, addr=0x110a899f9058, mask=0x110a899f9158) at hba.c:704 #4 0x11080cf90a04 in check_hba (port=0x110a899f7800) at hba.c:1718 #5 0x11080cf91d34 in hba_getauthmethod (port=0x110a899f7800) at hba.c:2256 #6 0x11080cf88eb3 in ClientAuthentication (port=0x110a899f7800) at auth.c:307 #7 0x11080d1edf5d in PerformAuthentication (port=0x110a899f7800) at postinit.c:223 #8 0x11080d1eeae7 in InitPostgres (in_dbname=0x110af4508c00 virtstart-dev, dboid=0, username=0x110af4508be0 virtstart-dev, out_dbname=0x0) at postinit.c:688 #9 0x11080d0a3eb1 in PostgresMain (argc=1, argv=0x110af4508c20, dbname=0x110af4508c00 virtstart-dev, username=0x110af4508be0 virtstart-dev) at postgres.c:3749 #10 0x11080d033537 in BackendRun (port=Could not find the frame base for BackendRun. ) at postmaster.c:4155 #11 0x11080d032be8 in BackendStartup (port=0x110a899f7800) at postmaster.c:3829 #12 0x11080d02f2d0 in ServerLoop () at postmaster.c:1597 #13 0x11080d02e968 in PostmasterMain (argc=3, argv=0x7f7d9658) at postmaster.c:1244 #14 0x11080cf96dc8 in main (argc=Could not find the frame base for main. ) at main.c:228 Current language: auto; currently asm This doesn't say much to me though. I guess my best shot is to post this at the postgresql list, right? Thanks, -- Hugo Osvaldo Barrera A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text? [demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015/02/16 21:02, Stuart Henderson wrote: On 2015/02/16 17:19, Hugo Osvaldo Barrera wrote: (gdb) bt Was this backtrace from a new coredump, or was it from one created by the old binary? (if the latter, please could you remove the old coredump and get it to crash again and send a fresh backtrace?) OK, replicated it here now...
Re: postgresql-server exiting abnormally after upgrade to -snapshot
j...@wxcvbn.org (Jérémie Courrèges-Anglas) writes: Please try the diff below. It fixes the backwards memcpy problem easily noticeable with psql -h ::1. Updated diff. Thanks to Stuart for reminding me that netmasks sa_len values can be much surprising. $OpenBSD$ --- src/backend/libpq/hba.c.origMon Feb 16 21:53:21 2015 +++ src/backend/libpq/hba.c Mon Feb 16 23:08:38 2015 @@ -700,8 +700,13 @@ check_ip(SockAddr *raddr, struct sockaddr * addr, stru struct sockaddr_storage addrcopy, maskcopy; - memcpy(addrcopy, addr, sizeof(addrcopy)); - memcpy(maskcopy, mask, sizeof(maskcopy)); + memcpy(addrcopy, addr, sizeof(struct sockaddr_in)); + /* +* On some OSes, if mask is obtained from eg. getifaddrs(3), sa_len +* can vary wildly. We already know that addr-sa_family == AF_INET, +* so just use sizeof(struct sockaddr_in). +*/ + memcpy(maskcopy, mask, sizeof(struct sockaddr_in)); pg_promote_v4_to_v6_addr(addrcopy); pg_promote_v4_to_v6_mask(maskcopy); -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Re: postgresql-server exiting abnormally after upgrade to -snapshot
Please try the diff below. It fixes the backwards memcpy problem easily noticeable with psql -h ::1. $OpenBSD$ --- src/backend/libpq/hba.c.origMon Feb 16 21:53:21 2015 +++ src/backend/libpq/hba.c Mon Feb 16 21:54:44 2015 @@ -700,8 +700,8 @@ check_ip(SockAddr *raddr, struct sockaddr * addr, stru struct sockaddr_storage addrcopy, maskcopy; - memcpy(addrcopy, addr, sizeof(addrcopy)); - memcpy(maskcopy, mask, sizeof(maskcopy)); + memcpy(addrcopy, addr, addr-sa_len); + memcpy(maskcopy, mask, mask-sa_len); pg_promote_v4_to_v6_addr(addrcopy); pg_promote_v4_to_v6_mask(maskcopy); -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015/02/16 17:19, Hugo Osvaldo Barrera wrote: (gdb) bt Was this backtrace from a new coredump, or was it from one created by the old binary? (if the latter, please could you remove the old coredump and get it to crash again and send a fresh backtrace?)
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-16 20:44, Stuart Henderson wrote: Thanks a lot, it did. I was unaware of make DEBUG, and had been editing the Makefile with no success. The missing piece is that, normally, binaries get stripped of their debug symbols in the fake install stage. Passing the flags in via DEBUG (in most cases) avoids this step. Could you let me have a copy of your pg_hba.conf please? Looking at the trace and code it's a bit odd and I'd like to try and replicate it here if I can .. After submitting the backtrace upstream (eg: to the pgsql list), it would seem that it's an issue on the postgres codebase, triggered by the OpenBSD upgrade (apparently), but nonetheless an issue in pg itself: http://www.postgresql.org/message-id/16513.1424120...@sss.pgh.pa.us I'll post back (for posterity's sake) once I have a permanent fix. Thanks a bunch for helping be track the issue down and getting a proper backtrace. -- Hugo Osvaldo Barrera A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text? [demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On Mon, Feb 16, 2015 at 2:19 PM, Hugo Osvaldo Barrera h...@barrera.io wrote: On 2015-02-16 16:24, Stuart Henderson wrote: On 2015-02-15, Hugo Osvaldo Barrera h...@barrera.io wrote: Am I mistaken in understanding that this is an issue with postgresql itself, and not a local configuration error? Correct. I tried building postgres with debug symbols (I added the flags described here[1] to the ports Makefile), but the backtrace is still useless: Please would you rebuild from the original port like this: make clean=all make DEBUG=-O0 -g repackage sudo make reinstall and see if this gives a better backtrace. Thanks a lot, it did. I was unaware of make DEBUG, and had been editing the Makefile with no success. (gdb) bt #0 0x110a2815b92a in kill () at stdin:2 #1 0x110a28195119 in abort () at /usr/src/lib/libc/stdlib/abort.c:53 #2 0x110a2816a238 in memcpy (dst0=0xfb8d4, src0=0x6, length=0) at /usr/src/lib/libc/string/memcpy.c:65 #3 0x11080cf8d1b1 in check_ip (raddr=0x110a899f7918, addr=0x110a899f9058, mask=0x110a899f9158) at hba.c:704 #4 0x11080cf90a04 in check_hba (port=0x110a899f7800) at hba.c:1718 #5 0x11080cf91d34 in hba_getauthmethod (port=0x110a899f7800) at hba.c:2256 #6 0x11080cf88eb3 in ClientAuthentication (port=0x110a899f7800) at auth.c:307 #7 0x11080d1edf5d in PerformAuthentication (port=0x110a899f7800) at postinit.c:223 #8 0x11080d1eeae7 in InitPostgres (in_dbname=0x110af4508c00 virtstart-dev, dboid=0, username=0x110af4508be0 virtstart-dev, out_dbname=0x0) at postinit.c:688 #9 0x11080d0a3eb1 in PostgresMain (argc=1, argv=0x110af4508c20, dbname=0x110af4508c00 virtstart-dev, username=0x110af4508be0 virtstart-dev) at postgres.c:3749 #10 0x11080d033537 in BackendRun (port=Could not find the frame base for BackendRun. ) at postmaster.c:4155 #11 0x11080d032be8 in BackendStartup (port=0x110a899f7800) at postmaster.c:3829 #12 0x11080d02f2d0 in ServerLoop () at postmaster.c:1597 #13 0x11080d02e968 in PostmasterMain (argc=3, argv=0x7f7d9658) at postmaster.c:1244 #14 0x11080cf96dc8 in main (argc=Could not find the frame base for main. ) at main.c:228 Current language: auto; currently asm This doesn't say much to me though. I guess my best shot is to post this at the postgresql list, right? Thanks, http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/libpq/hba.c;h=9cde6a21ce99003102dc9303288001d24e3ba2b6;hb=HEAD#l703 One of these are the offending lines... Refer to http://www.tedunangst.com/flak/post/memcpy-vs-memmove Guys, please correct me if I am wrong. There might be more such bugs in postgres, not sure why others are not hitting those.
Re: postgresql-server exiting abnormally after upgrade to -snapshot
Jérémie Courrèges-Anglas wrote: Please try the diff below. It fixes the backwards memcpy problem easily noticeable with psql -h ::1. $OpenBSD$ --- src/backend/libpq/hba.c.orig Mon Feb 16 21:53:21 2015 +++ src/backend/libpq/hba.c Mon Feb 16 21:54:44 2015 @@ -700,8 +700,8 @@ check_ip(SockAddr *raddr, struct sockaddr * addr, stru struct sockaddr_storage addrcopy, maskcopy; - memcpy(addrcopy, addr, sizeof(addrcopy)); - memcpy(maskcopy, mask, sizeof(maskcopy)); + memcpy(addrcopy, addr, addr-sa_len); + memcpy(maskcopy, mask, mask-sa_len); pg_promote_v4_to_v6_addr(addrcopy); pg_promote_v4_to_v6_mask(maskcopy); How did this ever work? You're changing the source too. This isn't just a backwards memcpy, it was an overflow.
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-16 23:21, Jérémie Courrèges-Anglas wrote: j...@wxcvbn.org (Jérémie Courrèges-Anglas) writes: Please try the diff below. It fixes the backwards memcpy problem easily noticeable with psql -h ::1. Updated diff. Thanks to Stuart for reminding me that netmasks sa_len values can be much surprising. $OpenBSD$ --- src/backend/libpq/hba.c.orig Mon Feb 16 21:53:21 2015 +++ src/backend/libpq/hba.c Mon Feb 16 23:08:38 2015 @@ -700,8 +700,13 @@ check_ip(SockAddr *raddr, struct sockaddr * addr, stru struct sockaddr_storage addrcopy, maskcopy; - memcpy(addrcopy, addr, sizeof(addrcopy)); - memcpy(maskcopy, mask, sizeof(maskcopy)); + memcpy(addrcopy, addr, sizeof(struct sockaddr_in)); + /* + * On some OSes, if mask is obtained from eg. getifaddrs(3), sa_len + * can vary wildly. We already know that addr-sa_family == AF_INET, + * so just use sizeof(struct sockaddr_in). + */ + memcpy(maskcopy, mask, sizeof(struct sockaddr_in)); pg_promote_v4_to_v6_addr(addrcopy); pg_promote_v4_to_v6_mask(maskcopy); I can confirm that this works. The server has been up and running with no issues during a few hours. Will anybody be submiting this upstream? Thanks for all your help! -- Hugo Osvaldo Barrera A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text? [demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-16 21:02, Stuart Henderson wrote: On 2015/02/16 17:19, Hugo Osvaldo Barrera wrote: (gdb) bt Was this backtrace from a new coredump, or was it from one created by the old binary? (if the latter, please could you remove the old coredump and get it to crash again and send a fresh backtrace?) My pg_hba is the stock one (since it had also been deleted): http://sprunge.us/ZdQI It was a brand-new core dump, since I had deleted /var/postgresql right before generating it. I regenerated it just to be sure, and it's the same: (gdb) bt #0 0x110a2815b92a in kill () at stdin:2 #1 0x110a28195119 in abort () at /usr/src/lib/libc/stdlib/abort.c:53 #2 0x110a2816a238 in memcpy (dst0=0xf81bf, src0=0x6, length=0) at /usr/src/lib/libc/string/memcpy.c:65 #3 0x11080cf8d1b1 in check_ip (raddr=0x110abc279918, addr=0x110a899f9058, mask=0x110a899f9158) at hba.c:704 #4 0x11080cf90a04 in check_hba (port=0x110abc279800) at hba.c:1718 #5 0x11080cf91d34 in hba_getauthmethod (port=0x110abc279800) at hba.c:2256 #6 0x11080cf88eb3 in ClientAuthentication (port=0x110abc279800) at auth.c:307 #7 0x11080d1edf5d in PerformAuthentication (port=0x110abc279800) at postinit.c:223 #8 0x11080d1eeae7 in InitPostgres (in_dbname=0x110ad7782be0 virtstart-dev, dboid=0, username=0x110ad7782bc0 virtstart-dev, out_dbname=0x0) at postinit.c:688 #9 0x11080d0a3eb1 in PostgresMain (argc=1, argv=0x110ad7782c00, dbname=0x110ad7782be0 virtstart-dev, username=0x110ad7782bc0 virtstart-dev) at postgres.c:3749 #10 0x11080d033537 in BackendRun (port=Could not find the frame base for BackendRun. ) at postmaster.c:4155 #11 0x11080d032be8 in BackendStartup (port=0x110abc279800) at postmaster.c:3829 #12 0x11080d02f2d0 in ServerLoop () at postmaster.c:1597 #13 0x11080d02e968 in PostmasterMain (argc=3, argv=0x7f7d9658) at postmaster.c:1244 #14 0x11080cf96dc8 in main (argc=Could not find the frame base for main. ) at main.c:228 Current language: auto; currently asm Thanks, -- Hugo Osvaldo Barrera A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text? [demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]
Re: postgresql-server exiting abnormally after upgrade to -snapshot
worked out with jca and lteo, this fixes this issue (which only occurs when there's an ipv6 connection) for me. Index: Makefile === RCS file: /cvs/ports/databases/postgresql/Makefile,v retrieving revision 1.198 diff -u -p -r1.198 Makefile --- Makefile6 Feb 2015 09:01:21 - 1.198 +++ Makefile16 Feb 2015 22:37:23 - @@ -10,6 +10,7 @@ COMMENT-plpython=Python procedural langu # in case a dump before / restore after pkg_add -u is required! VERSION= 9.4.1 +REVISION-server= 0 DISTNAME= postgresql-${VERSION} PKGNAME-main= postgresql-client-${VERSION} PKGNAME-server=postgresql-server-${VERSION} Index: patches/patch-src_backend_libpq_hba_c === RCS file: patches/patch-src_backend_libpq_hba_c diff -N patches/patch-src_backend_libpq_hba_c --- /dev/null 1 Jan 1970 00:00:00 - +++ patches/patch-src_backend_libpq_hba_c 16 Feb 2015 22:37:23 - @@ -0,0 +1,21 @@ +$OpenBSD$ + +Fix crash when connecting over IPv6. backwards memcpy logged but it's worse. +Don't copy the whole space for a sockaddr_storage, at this point in the +code the addr/mask are known to be a sockaddr_in. Not using sa_len because +in some cases mask-sa_len is too short (suspect this may be an issue +related to http://marc.info/?l=openbsd-techm=138089192205849w=2). + +--- src/backend/libpq/hba.c.orig Mon Feb 2 20:42:55 2015 src/backend/libpq/hba.cMon Feb 16 22:13:26 2015 +@@ -700,8 +700,8 @@ check_ip(SockAddr *raddr, struct sockaddr * addr, stru + struct sockaddr_storage addrcopy, + maskcopy; + +- memcpy(addrcopy, addr, sizeof(addrcopy)); +- memcpy(maskcopy, mask, sizeof(maskcopy)); ++ memcpy(addrcopy, addr, sizeof(struct sockaddr_in)); ++ memcpy(maskcopy, mask, sizeof(struct sockaddr_in)); + pg_promote_v4_to_v6_addr(addrcopy); + pg_promote_v4_to_v6_mask(maskcopy); +
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-14 02:28, Abel Abraham Camarillo Ojeda wrote: On Sat, Feb 14, 2015 at 2:12 AM, Hugo Osvaldo Barrera h...@barrera.io wrote: On 2015-02-13 13:20, Stuart Henderson wrote: On 2015-02-12, Hugo Osvaldo Barrera h...@barrera.io wrote: On 2015-02-12 10:18, Stuart Henderson wrote: On 2015-02-11, Hugo Osvaldo Barrera h...@barrera.io wrote: Can someone else confirm postgres9.4 work fine on the latest -snapshot? (the confirmation would be helpful to reafirm that it's not an issue with some dependency or library). Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on amd64. Ok, so now I know that the issue is on my end. Which leaves me even more confused. You're running the latest snapshots too, right? (eg: the ones from feb 10th?). Aside from a clean install, do you have any more changes? Perhaps login.conf? I have the login.conf section from the example in the pkg-readme, postgresql:\ :openfiles-cur=768:\ :tc=daemon: and this in sysctl.conf # postgresql kern.seminfo.semmni=256 kern.seminfo.semmns=2048 kern.shminfo.shmmax=50331648 sthen@hutch:~:532$ ls -l /bin/ls /usr/local/bin/postgres -r-xr-xr-x 1 root bin 267968 Feb 10 23:19 /bin/ls* -r-xr-xr-x 1 root bin 6508711 Feb 9 03:21 /usr/local/bin/postgres* sthen@hutch:~:533$ sysctl kern.version kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST 2015 t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC Thanks for all the details. It looks like almost everything is identical except our kernels (I had a few extra fields in sysctl.conf edited for pg, but reverted them just to make sure they weren't screwing up). # sysctl kern.version kern.version=OpenBSD 5.7-beta (GENERIC.MP) #852: Tue Feb 10 16:31:16 MST 2015 t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP I switched to the SP kernel just to discard any possible regressions that might be affecting this scenario, but no change. It looks like the issue is elsewhere, but I've no idea where to look. I've so far failed to build postgresql-server with debug symbols enabled too, but that's just lack of knowledge on my part. -- Hugo Osvaldo Barrera A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text? [demime 1.01d removed an attachment of type application/pgp-signature] you should give more information about how to reproduce this problem, how accurately can you reproduce it, are you sending just a given query and it always crashes? It always crashes extremely frequently. I haven't noticed a pattern, and the server never lives more than a few senconds. No particular query seems to trigger it, and adding log_statement showed that it may even crash *before* any queries are executed (see below as well). you should get more error context, maybe try log_statement into postgresql.conf and try to log all statements and see which one crashes it... http://www.postgresql.org/docs/9.4/static/runtime-config-logging.html are you using any custom C extension? Nope, this is a plain default install from snapshots with nothing extra. did you dump and restore database ? did you use 'custom format' or 'plain format' ? My latest tests reproduce the same issue on a clean out-of-the-box db (eg: not importing any data). there where any errors on import? - postgres just warns about some import errors, which in my opinion are severe... This is a log with log_statement and a most logging turned on. I'd only run the server *once* post-initialization before this. The database was completely empty: http://sprunge.us/UVGj While a query managed to get through once, the server usually crashed before that happens. Here's another, finer-grained log, with nothing useful (apperently) either: http://sprunge.us/FQaJ Thanks, -- Hugo Osvaldo Barrera A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text? [demime 1.01d removed an attachment of type application/pgp-signature]
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On Saturday 14 February 2015, Hugo Osvaldo Barrera wrote: On 2015-02-14 02:28, Abel Abraham Camarillo Ojeda wrote: On Sat, Feb 14, 2015 at 2:12 AM, Hugo Osvaldo Barrera h...@barrera.io wrote: On 2015-02-13 13:20, Stuart Henderson wrote: On 2015-02-12, Hugo Osvaldo Barrera h...@barrera.io wrote: On 2015-02-12 10:18, Stuart Henderson wrote: On 2015-02-11, Hugo Osvaldo Barrera h...@barrera.io wrote: Can someone else confirm postgres9.4 work fine on the latest -snapshot? (the confirmation would be helpful to reafirm that it's not an issue with some dependency or library). Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on amd64. Ok, so now I know that the issue is on my end. Which leaves me even more confused. You're running the latest snapshots too, right? (eg: the ones from feb 10th?). Aside from a clean install, do you have any more changes? Perhaps login.conf? I have the login.conf section from the example in the pkg-readme, postgresql:\ :openfiles-cur=768:\ :tc=daemon: and this in sysctl.conf # postgresql kern.seminfo.semmni=256 kern.seminfo.semmns=2048 kern.shminfo.shmmax=50331648 sthen@hutch:~:532$ ls -l /bin/ls /usr/local/bin/postgres -r-xr-xr-x 1 root bin 267968 Feb 10 23:19 /bin/ls* -r-xr-xr-x 1 root bin 6508711 Feb 9 03:21 /usr/local/bin/postgres* sthen@hutch:~:533$ sysctl kern.version kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST 2015 t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC Thanks for all the details. It looks like almost everything is identical except our kernels (I had a few extra fields in sysctl.conf edited for pg, but reverted them just to make sure they weren't screwing up). # sysctl kern.version kern.version=OpenBSD 5.7-beta (GENERIC.MP) #852: Tue Feb 10 16:31:16 MST 2015 t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP I switched to the SP kernel just to discard any possible regressions that might be affecting this scenario, but no change. It looks like the issue is elsewhere, but I've no idea where to look. I've so far failed to build postgresql-server with debug symbols enabled too, but that's just lack of knowledge on my part. -- Hugo Osvaldo Barrera A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text? [demime 1.01d removed an attachment of type application/pgp-signature] you should give more information about how to reproduce this problem, how accurately can you reproduce it, are you sending just a given query and it always crashes? It always crashes extremely frequently. I haven't noticed a pattern, and the server never lives more than a few senconds. No particular query seems to trigger it, and adding log_statement showed that it may even crash *before* any queries are executed (see below as well). you should get more error context, maybe try log_statement into postgresql.conf and try to log all statements and see which one crashes it... http://www.postgresql.org/docs/9.4/static/runtime-config-logging.html are you using any custom C extension? Nope, this is a plain default install from snapshots with nothing extra. did you dump and restore database ? did you use 'custom format' or 'plain format' ? My latest tests reproduce the same issue on a clean out-of-the-box db (eg: not importing any data). there where any errors on import? - postgres just warns about some import errors, which in my opinion are severe... This is a log with log_statement and a most logging turned on. I'd only run the server *once* post-initialization before this. The database was completely empty: http://sprunge.us/UVGj While a query managed to get through once, the server usually crashed before that happens. The interesting/useful part is: LOG: statement: SELECT ... ORDER BY c.oid LOG: server process (PID 11531) was terminated by signal 6: Abort trap So the server process is being sent a SIGABRT, which is causing it to terminate. There is a good chance this this is coming from the stack protector, which sends a SIGABRT if the stack is smashed. Is there anything in dmesg or syslog that correlates? Failing that your next step is likely to run it under gdb and get a backtrace from the point where the SIGABRT occurs. You can also bisect by rolling back to an older snapshot to see if you can locate the change that has triggered the issue. Here's another, finer-grained log, with nothing useful (apperently) either: http://sprunge.us/FQaJ Thanks, -- Hugo Osvaldo Barrera A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text? [demime 1.01d
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-14, Joel Sing j...@sing.id.au wrote: The interesting/useful part is: LOG: statement: SELECT ... ORDER BY c.oid LOG: server process (PID 11531) was terminated by signal 6: Abort trap So the server process is being sent a SIGABRT, which is causing it to terminate. There is a good chance this this is coming from the stack protector, which sends a SIGABRT if the stack is smashed. Oh, good call. It could also be a backwards memcpy which would show up in /var/log/messages (assuming usual config). If it were another program, our strict mutex checks can also cause SIGABRT, but that won't apply to pgsql as it's not threaded.
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-13 13:20, Stuart Henderson wrote: On 2015-02-12, Hugo Osvaldo Barrera h...@barrera.io wrote: On 2015-02-12 10:18, Stuart Henderson wrote: On 2015-02-11, Hugo Osvaldo Barrera h...@barrera.io wrote: Can someone else confirm postgres9.4 work fine on the latest -snapshot? (the confirmation would be helpful to reafirm that it's not an issue with some dependency or library). Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on amd64. Ok, so now I know that the issue is on my end. Which leaves me even more confused. You're running the latest snapshots too, right? (eg: the ones from feb 10th?). Aside from a clean install, do you have any more changes? Perhaps login.conf? I have the login.conf section from the example in the pkg-readme, postgresql:\ :openfiles-cur=768:\ :tc=daemon: and this in sysctl.conf # postgresql kern.seminfo.semmni=256 kern.seminfo.semmns=2048 kern.shminfo.shmmax=50331648 sthen@hutch:~:532$ ls -l /bin/ls /usr/local/bin/postgres -r-xr-xr-x 1 root bin 267968 Feb 10 23:19 /bin/ls* -r-xr-xr-x 1 root bin 6508711 Feb 9 03:21 /usr/local/bin/postgres* sthen@hutch:~:533$ sysctl kern.version kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST 2015 t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC Thanks for all the details. It looks like almost everything is identical except our kernels (I had a few extra fields in sysctl.conf edited for pg, but reverted them just to make sure they weren't screwing up). # sysctl kern.version kern.version=OpenBSD 5.7-beta (GENERIC.MP) #852: Tue Feb 10 16:31:16 MST 2015 t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP I switched to the SP kernel just to discard any possible regressions that might be affecting this scenario, but no change. It looks like the issue is elsewhere, but I've no idea where to look. I've so far failed to build postgresql-server with debug symbols enabled too, but that's just lack of knowledge on my part. -- Hugo Osvaldo Barrera A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text? [demime 1.01d removed an attachment of type application/pgp-signature]
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On Sat, Feb 14, 2015 at 2:12 AM, Hugo Osvaldo Barrera h...@barrera.io wrote: On 2015-02-13 13:20, Stuart Henderson wrote: On 2015-02-12, Hugo Osvaldo Barrera h...@barrera.io wrote: On 2015-02-12 10:18, Stuart Henderson wrote: On 2015-02-11, Hugo Osvaldo Barrera h...@barrera.io wrote: Can someone else confirm postgres9.4 work fine on the latest -snapshot? (the confirmation would be helpful to reafirm that it's not an issue with some dependency or library). Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on amd64. Ok, so now I know that the issue is on my end. Which leaves me even more confused. You're running the latest snapshots too, right? (eg: the ones from feb 10th?). Aside from a clean install, do you have any more changes? Perhaps login.conf? I have the login.conf section from the example in the pkg-readme, postgresql:\ :openfiles-cur=768:\ :tc=daemon: and this in sysctl.conf # postgresql kern.seminfo.semmni=256 kern.seminfo.semmns=2048 kern.shminfo.shmmax=50331648 sthen@hutch:~:532$ ls -l /bin/ls /usr/local/bin/postgres -r-xr-xr-x 1 root bin 267968 Feb 10 23:19 /bin/ls* -r-xr-xr-x 1 root bin 6508711 Feb 9 03:21 /usr/local/bin/postgres* sthen@hutch:~:533$ sysctl kern.version kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST 2015 t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC Thanks for all the details. It looks like almost everything is identical except our kernels (I had a few extra fields in sysctl.conf edited for pg, but reverted them just to make sure they weren't screwing up). # sysctl kern.version kern.version=OpenBSD 5.7-beta (GENERIC.MP) #852: Tue Feb 10 16:31:16 MST 2015 t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP I switched to the SP kernel just to discard any possible regressions that might be affecting this scenario, but no change. It looks like the issue is elsewhere, but I've no idea where to look. I've so far failed to build postgresql-server with debug symbols enabled too, but that's just lack of knowledge on my part. -- Hugo Osvaldo Barrera A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text? [demime 1.01d removed an attachment of type application/pgp-signature] you should give more information about how to reproduce this problem, how accurately can you reproduce it, are you sending just a given query and it always crashes? you should get more error context, maybe try log_statement into postgresql.conf and try to log all statements and see which one crashes it... http://www.postgresql.org/docs/9.4/static/runtime-config-logging.html are you using any custom C extension? did you dump and restore database ? did you use 'custom format' or 'plain format' ? there where any errors on import? - postgres just warns about some import errors, which in my opinion are severe...
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-14 13:29, Stuart Henderson wrote: On 2015-02-14, Joel Sing j...@sing.id.au wrote: The interesting/useful part is: LOG: statement: SELECT ... ORDER BY c.oid LOG: server process (PID 11531) was terminated by signal 6: Abort trap So the server process is being sent a SIGABRT, which is causing it to terminate. There is a good chance this this is coming from the stack protector, which sends a SIGABRT if the stack is smashed. Oh, good call. It could also be a backwards memcpy which would show up in /var/log/messages (assuming usual config). Yup, backward memcpy it is (from /var/log/messages): Feb 14 12:27:34 elysion postgres: backwards memcpy Feb 14 12:28:10 elysion last message repeated 8 times Feb 14 12:30:19 elysion last message repeated 28 times Feb 14 12:40:28 elysion last message repeated 128 times Feb 14 12:50:40 elysion last message repeated 128 times Feb 14 13:00:41 elysion last message repeated 126 times Feb 14 13:10:42 elysion last message repeated 128 times Feb 14 13:20:49 elysion last message repeated 126 times Feb 14 13:30:55 elysion last message repeated 128 times Feb 14 13:41:06 elysion last message repeated 132 times Feb 14 13:51:10 elysion last message repeated 128 times Feb 14 14:01:18 elysion last message repeated 128 times Feb 14 14:08:18 elysion last message repeated 91 times Am I mistaken in understanding that this is an issue with postgresql itself, and not a local configuration error? I tried building postgres with debug symbols (I added the flags described here[1] to the ports Makefile), but the backtrace is still useless: # sudo -u _postgresql gdb -q -c postgres.core /usr/local/bin/postgres Core was generated by `postgres'. Program terminated with signal 6, Aborted. Loaded symbols for /usr/local/bin/postgres #0 0x0bd73424292a in ?? () (gdb) bt #0 0x0bd73424292a in ?? () #1 0x in ?? () Do I need any further OpenBSD-specific changes to get a useful backtrace? (I've to admit that I'm too familiar with debuging with gdb on any platform). Thanks for all the feedback so far! [1]: https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQ L_backend_on_Linux/BSD#Debugging_the_core_dump_-_example -- Hugo Osvaldo Barrera A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text? [demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-12, Hugo Osvaldo Barrera h...@barrera.io wrote: On 2015-02-12 10:18, Stuart Henderson wrote: On 2015-02-11, Hugo Osvaldo Barrera h...@barrera.io wrote: Can someone else confirm postgres9.4 work fine on the latest -snapshot? (the confirmation would be helpful to reafirm that it's not an issue with some dependency or library). Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on amd64. Ok, so now I know that the issue is on my end. Which leaves me even more confused. You're running the latest snapshots too, right? (eg: the ones from feb 10th?). Aside from a clean install, do you have any more changes? Perhaps login.conf? I have the login.conf section from the example in the pkg-readme, postgresql:\ :openfiles-cur=768:\ :tc=daemon: and this in sysctl.conf # postgresql kern.seminfo.semmni=256 kern.seminfo.semmns=2048 kern.shminfo.shmmax=50331648 sthen@hutch:~:532$ ls -l /bin/ls /usr/local/bin/postgres -r-xr-xr-x 1 root bin 267968 Feb 10 23:19 /bin/ls* -r-xr-xr-x 1 root bin 6508711 Feb 9 03:21 /usr/local/bin/postgres* sthen@hutch:~:533$ sysctl kern.version kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST 2015 t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-11, Hugo Osvaldo Barrera h...@barrera.io wrote: Can someone else confirm postgres9.4 work fine on the latest -snapshot? (the confirmation would be helpful to reafirm that it's not an issue with some dependency or library). Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on amd64.
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-12 10:18, Stuart Henderson wrote: On 2015-02-11, Hugo Osvaldo Barrera h...@barrera.io wrote: Can someone else confirm postgres9.4 work fine on the latest -snapshot? (the confirmation would be helpful to reafirm that it's not an issue with some dependency or library). Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on amd64. Ok, so now I know that the issue is on my end. Which leaves me even more confused. You're running the latest snapshots too, right? (eg: the ones from feb 10th?). Aside from a clean install, do you have any more changes? Perhaps login.conf? Thanks, -- Hugo Osvaldo Barrera A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text? [demime 1.01d removed an attachment of type application/pgp-signature]
Re: postgresql-server exiting abnormally after upgrade to -snapshot
On 2015-02-11 19:54, Jan Stary wrote: On Feb 11 14:49:17, h...@barrera.io wrote: Hi, I upgraded to -snapshot today, and did all the proper postgresql upgrade: pg_dump, moved the old db out the the way, re-init'd, started, and import. The thing is, upon receiving connections, postgres dies horribly. The log is just this following iterating over and over: WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. LOG: all server processes terminated; reinitializing LOG: database system was interrupted; last known up at 2015-02-11 17:01:00 GMT LOG: database system was not properly shut down; automatic recovery in progress LOG: record with zero length at 0/1696370 LOG: redo is not required LOG: database system is ready to accept connections LOG: autovacuum launcher started LOG: server process (PID 9444) was terminated by signal 6: Abort trap LOG: terminating any other active server processes After much frustration (even building -current), I deleted all of it, uninstall, built 9.3.4 using the old ports recipe, installed - same issue! It's clearly not an upgrade issue, because deleting all the data files and going back to 9.3 has the same issue. Have you stopped the DB server before performing the upgrade? Are you sure (pgrep -fl post) that there is no other server process around? Jan Yes, I did. I also did this when installing the version I built from ports (which I also tried with no change). I actually did the entire process a few times, with -snapshots, -current and installing from packages. All exhibited the same behaviour, so I'm starting to suspect the issue is not postgres per se. Has anyone else has this issue, or similar issues with -snapshot/-current? Can someone else confirm postgres9.4 work fine on the latest -snapshot? (the confirmation would be helpful to reafirm that it's not an issue with some dependency or library). Thanks, -- Hugo Osvaldo Barrera A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text? [demime 1.01d removed an attachment of type application/pgp-signature] -- Hugo Osvaldo Barrera A: Because we read from top to bottom, left to right. Q: Why should I start my reply below the quoted text? [demime 1.01d removed an attachment of type application/pgp-signature]