Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-21 Thread Stuart Henderson
On 2015-02-17, Ted Unangst t...@tedunangst.com wrote:
 -memcpy(addrcopy, addr, sizeof(addrcopy));
 -memcpy(maskcopy, mask, sizeof(maskcopy));
 +memcpy(addrcopy, addr, addr-sa_len);
 +memcpy(maskcopy, mask, mask-sa_len);

 How did this ever work?

It didn't.



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-17 Thread Craig Skinner
On 2015-02-16 Mon 18:19 PM |, Hugo Osvaldo Barrera wrote:
   #3  0x11080cf8d1b1 in check_ip (raddr=0x110abc279918, 
 addr=0x110a899f9058, mask=0x110a899f9158) at hba.c:704

Is this an IPv6 thing?

Until recently, Squid crashes likewise:

Squid bug:  4024
Status: RESOLVED FIXED

log a warning but not abort if ::1 and *only* ::1 has failed to resolve.

http://marc.info/?l=openbsd-portsm=141339262226378w=2



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-16 Thread Stuart Henderson
On 2015-02-15, Hugo Osvaldo Barrera h...@barrera.io wrote:

 Am I mistaken in understanding that this is an issue with postgresql itself,
 and not a local configuration error?

Correct.

 I tried building postgres with debug symbols (I added the flags described
 here[1] to the ports Makefile), but the backtrace is still useless:

Please would you rebuild from the original port like this:

make clean=all
make DEBUG=-O0 -g repackage  sudo make reinstall

and see if this gives a better backtrace.



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-16 Thread Hugo Osvaldo Barrera
On 2015-02-16 16:24, Stuart Henderson wrote:
 On 2015-02-15, Hugo Osvaldo Barrera h...@barrera.io wrote:
 
  Am I mistaken in understanding that this is an issue with postgresql
itself,
  and not a local configuration error?

 Correct.

  I tried building postgres with debug symbols (I added the flags described
  here[1] to the ports Makefile), but the backtrace is still useless:

 Please would you rebuild from the original port like this:

 make clean=all
 make DEBUG=-O0 -g repackage  sudo make reinstall

 and see if this gives a better backtrace.


Thanks a lot, it did. I was unaware of make DEBUG, and had been editing the
Makefile with no success.

  (gdb) bt
  #0  0x110a2815b92a in kill () at stdin:2
  #1  0x110a28195119 in abort () at /usr/src/lib/libc/stdlib/abort.c:53
  #2  0x110a2816a238 in memcpy (dst0=0xfb8d4, src0=0x6, length=0) at
/usr/src/lib/libc/string/memcpy.c:65
  #3  0x11080cf8d1b1 in check_ip (raddr=0x110a899f7918,
addr=0x110a899f9058, mask=0x110a899f9158) at hba.c:704
  #4  0x11080cf90a04 in check_hba (port=0x110a899f7800) at hba.c:1718
  #5  0x11080cf91d34 in hba_getauthmethod (port=0x110a899f7800) at
hba.c:2256
  #6  0x11080cf88eb3 in ClientAuthentication (port=0x110a899f7800) at
auth.c:307
  #7  0x11080d1edf5d in PerformAuthentication (port=0x110a899f7800) at
postinit.c:223
  #8  0x11080d1eeae7 in InitPostgres (in_dbname=0x110af4508c00
virtstart-dev, dboid=0,
  username=0x110af4508be0 virtstart-dev, out_dbname=0x0) at
postinit.c:688
  #9  0x11080d0a3eb1 in PostgresMain (argc=1, argv=0x110af4508c20,
dbname=0x110af4508c00 virtstart-dev,
  username=0x110af4508be0 virtstart-dev) at postgres.c:3749
  #10 0x11080d033537 in BackendRun (port=Could not find the frame base for
BackendRun.
  ) at postmaster.c:4155
  #11 0x11080d032be8 in BackendStartup (port=0x110a899f7800) at
postmaster.c:3829
  #12 0x11080d02f2d0 in ServerLoop () at postmaster.c:1597
  #13 0x11080d02e968 in PostmasterMain (argc=3, argv=0x7f7d9658) at
postmaster.c:1244
  #14 0x11080cf96dc8 in main (argc=Could not find the frame base for
main.
  ) at main.c:228
  Current language:  auto; currently asm

This doesn't say much to me though. I guess my best shot is to post this at
the
postgresql list, right?

Thanks,

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature which had 
a name of signature.asc]



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-16 Thread Stuart Henderson
On 2015/02/16 21:02, Stuart Henderson wrote:
 On 2015/02/16 17:19, Hugo Osvaldo Barrera wrote:
(gdb) bt
 
 Was this backtrace from a new coredump, or was it from one created by
 the old binary? (if the latter, please could you remove the old coredump
 and get it to crash again and send a fresh backtrace?)
 

OK, replicated it here now...



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-16 Thread Jérémie Courrèges-Anglas
j...@wxcvbn.org (Jérémie Courrèges-Anglas) writes:

 Please try the diff below.  It fixes the backwards memcpy problem
 easily noticeable with psql -h ::1.

Updated diff. Thanks to Stuart for reminding me that netmasks sa_len
values can be much surprising.

$OpenBSD$
--- src/backend/libpq/hba.c.origMon Feb 16 21:53:21 2015
+++ src/backend/libpq/hba.c Mon Feb 16 23:08:38 2015
@@ -700,8 +700,13 @@ check_ip(SockAddr *raddr, struct sockaddr * addr, stru
struct sockaddr_storage addrcopy,
maskcopy;
 
-   memcpy(addrcopy, addr, sizeof(addrcopy));
-   memcpy(maskcopy, mask, sizeof(maskcopy));
+   memcpy(addrcopy, addr, sizeof(struct sockaddr_in));
+   /*
+* On some OSes, if mask is obtained from eg. getifaddrs(3), 
sa_len
+* can vary wildly. We already know that addr-sa_family == 
AF_INET,
+* so just use sizeof(struct sockaddr_in).
+*/
+   memcpy(maskcopy, mask, sizeof(struct sockaddr_in));
pg_promote_v4_to_v6_addr(addrcopy);
pg_promote_v4_to_v6_mask(maskcopy);
 


-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-16 Thread Jérémie Courrèges-Anglas
Please try the diff below.  It fixes the backwards memcpy problem
easily noticeable with psql -h ::1.

$OpenBSD$
--- src/backend/libpq/hba.c.origMon Feb 16 21:53:21 2015
+++ src/backend/libpq/hba.c Mon Feb 16 21:54:44 2015
@@ -700,8 +700,8 @@ check_ip(SockAddr *raddr, struct sockaddr * addr, stru
struct sockaddr_storage addrcopy,
maskcopy;
 
-   memcpy(addrcopy, addr, sizeof(addrcopy));
-   memcpy(maskcopy, mask, sizeof(maskcopy));
+   memcpy(addrcopy, addr, addr-sa_len);
+   memcpy(maskcopy, mask, mask-sa_len);
pg_promote_v4_to_v6_addr(addrcopy);
pg_promote_v4_to_v6_mask(maskcopy);
 


-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-16 Thread Stuart Henderson
On 2015/02/16 17:19, Hugo Osvaldo Barrera wrote:
   (gdb) bt

Was this backtrace from a new coredump, or was it from one created by
the old binary? (if the latter, please could you remove the old coredump
and get it to crash again and send a fresh backtrace?)



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-16 Thread Hugo Osvaldo Barrera
On 2015-02-16 20:44, Stuart Henderson wrote:
  Thanks a lot, it did. I was unaware of make DEBUG, and had been editing
the
  Makefile with no success.

 The missing piece is that, normally, binaries get stripped of their
 debug symbols in the fake install stage. Passing the flags in via DEBUG
 (in most cases) avoids this step.

 Could you let me have a copy of your pg_hba.conf please? Looking at the
 trace and code it's a bit odd and I'd like to try and replicate it here if
 I can ..


After submitting the backtrace upstream (eg: to the pgsql list), it would
seem
that it's an issue on the postgres codebase, triggered by the OpenBSD upgrade
(apparently), but nonetheless an issue in pg itself:

  http://www.postgresql.org/message-id/16513.1424120...@sss.pgh.pa.us

I'll post back (for posterity's sake) once I have a permanent fix.

Thanks a bunch for helping be track the issue down and getting a proper
backtrace.

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature which had 
a name of signature.asc]



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-16 Thread Amit Kulkarni
On Mon, Feb 16, 2015 at 2:19 PM, Hugo Osvaldo Barrera h...@barrera.io
wrote:

 On 2015-02-16 16:24, Stuart Henderson wrote:
  On 2015-02-15, Hugo Osvaldo Barrera h...@barrera.io wrote:
  
   Am I mistaken in understanding that this is an issue with postgresql
 itself,
   and not a local configuration error?
 
  Correct.
 
   I tried building postgres with debug symbols (I added the flags
 described
   here[1] to the ports Makefile), but the backtrace is still useless:
 
  Please would you rebuild from the original port like this:
 
  make clean=all
  make DEBUG=-O0 -g repackage  sudo make reinstall
 
  and see if this gives a better backtrace.
 

 Thanks a lot, it did. I was unaware of make DEBUG, and had been editing the
 Makefile with no success.

   (gdb) bt
   #0  0x110a2815b92a in kill () at stdin:2
   #1  0x110a28195119 in abort () at /usr/src/lib/libc/stdlib/abort.c:53
   #2  0x110a2816a238 in memcpy (dst0=0xfb8d4, src0=0x6, length=0) at
 /usr/src/lib/libc/string/memcpy.c:65
   #3  0x11080cf8d1b1 in check_ip (raddr=0x110a899f7918,
 addr=0x110a899f9058, mask=0x110a899f9158) at hba.c:704
   #4  0x11080cf90a04 in check_hba (port=0x110a899f7800) at hba.c:1718
   #5  0x11080cf91d34 in hba_getauthmethod (port=0x110a899f7800) at
 hba.c:2256
   #6  0x11080cf88eb3 in ClientAuthentication (port=0x110a899f7800) at
 auth.c:307
   #7  0x11080d1edf5d in PerformAuthentication (port=0x110a899f7800) at
 postinit.c:223
   #8  0x11080d1eeae7 in InitPostgres (in_dbname=0x110af4508c00
 virtstart-dev, dboid=0,
   username=0x110af4508be0 virtstart-dev, out_dbname=0x0) at
 postinit.c:688
   #9  0x11080d0a3eb1 in PostgresMain (argc=1, argv=0x110af4508c20,
 dbname=0x110af4508c00 virtstart-dev,
   username=0x110af4508be0 virtstart-dev) at postgres.c:3749
   #10 0x11080d033537 in BackendRun (port=Could not find the frame base
 for
 BackendRun.
   ) at postmaster.c:4155
   #11 0x11080d032be8 in BackendStartup (port=0x110a899f7800) at
 postmaster.c:3829
   #12 0x11080d02f2d0 in ServerLoop () at postmaster.c:1597
   #13 0x11080d02e968 in PostmasterMain (argc=3, argv=0x7f7d9658) at
 postmaster.c:1244
   #14 0x11080cf96dc8 in main (argc=Could not find the frame base for
 main.
   ) at main.c:228
   Current language:  auto; currently asm

 This doesn't say much to me though. I guess my best shot is to post this at
 the
 postgresql list, right?

 Thanks,


http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/libpq/hba.c;h=9cde6a21ce99003102dc9303288001d24e3ba2b6;hb=HEAD#l703

One of these are the offending lines...
Refer to http://www.tedunangst.com/flak/post/memcpy-vs-memmove

Guys, please correct me if I am wrong. There might be more such bugs in
postgres, not sure why others are not hitting those.



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-16 Thread Ted Unangst
Jérémie Courrèges-Anglas wrote:
 Please try the diff below.  It fixes the backwards memcpy problem
 easily noticeable with psql -h ::1.
 
 $OpenBSD$
 --- src/backend/libpq/hba.c.orig  Mon Feb 16 21:53:21 2015
 +++ src/backend/libpq/hba.c   Mon Feb 16 21:54:44 2015
 @@ -700,8 +700,8 @@ check_ip(SockAddr *raddr, struct sockaddr * addr, stru
   struct sockaddr_storage addrcopy,
   maskcopy;
  
 - memcpy(addrcopy, addr, sizeof(addrcopy));
 - memcpy(maskcopy, mask, sizeof(maskcopy));
 + memcpy(addrcopy, addr, addr-sa_len);
 + memcpy(maskcopy, mask, mask-sa_len);
   pg_promote_v4_to_v6_addr(addrcopy);
   pg_promote_v4_to_v6_mask(maskcopy);

How did this ever work? You're changing the source too. This isn't just a
backwards memcpy, it was an overflow.



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-16 Thread Hugo Osvaldo Barrera
On 2015-02-16 23:21, Jérémie Courrèges-Anglas wrote:
 j...@wxcvbn.org (Jérémie Courrèges-Anglas) writes:

  Please try the diff below.  It fixes the backwards memcpy problem
  easily noticeable with psql -h ::1.

 Updated diff. Thanks to Stuart for reminding me that netmasks sa_len
 values can be much surprising.

 $OpenBSD$
 --- src/backend/libpq/hba.c.orig  Mon Feb 16 21:53:21 2015
 +++ src/backend/libpq/hba.c   Mon Feb 16 23:08:38 2015
 @@ -700,8 +700,13 @@ check_ip(SockAddr *raddr, struct sockaddr * addr, stru
   struct sockaddr_storage addrcopy,
   maskcopy;

 - memcpy(addrcopy, addr, sizeof(addrcopy));
 - memcpy(maskcopy, mask, sizeof(maskcopy));
 + memcpy(addrcopy, addr, sizeof(struct sockaddr_in));
 + /*
 +  * On some OSes, if mask is obtained from eg. getifaddrs(3), 
 sa_len
 +  * can vary wildly. We already know that addr-sa_family == 
 AF_INET,
 +  * so just use sizeof(struct sockaddr_in).
 +  */
 + memcpy(maskcopy, mask, sizeof(struct sockaddr_in));
   pg_promote_v4_to_v6_addr(addrcopy);
   pg_promote_v4_to_v6_mask(maskcopy);


I can confirm that this works. The server has been up and running with no
issues during a few hours.

Will anybody be submiting this upstream?

Thanks for all your help!

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature which had 
a name of signature.asc]



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-16 Thread Hugo Osvaldo Barrera
On 2015-02-16 21:02, Stuart Henderson wrote:
 On 2015/02/16 17:19, Hugo Osvaldo Barrera wrote:
(gdb) bt

 Was this backtrace from a new coredump, or was it from one created by
 the old binary? (if the latter, please could you remove the old coredump
 and get it to crash again and send a fresh backtrace?)


My pg_hba is the stock one (since it had also been deleted):
http://sprunge.us/ZdQI

It was a brand-new core dump, since I had deleted /var/postgresql right
before
generating it. I regenerated it just to be sure, and it's the same:

  (gdb) bt
  #0  0x110a2815b92a in kill () at stdin:2
  #1  0x110a28195119 in abort () at /usr/src/lib/libc/stdlib/abort.c:53
  #2  0x110a2816a238 in memcpy (dst0=0xf81bf, src0=0x6, length=0) at
/usr/src/lib/libc/string/memcpy.c:65
  #3  0x11080cf8d1b1 in check_ip (raddr=0x110abc279918,
addr=0x110a899f9058, mask=0x110a899f9158) at hba.c:704
  #4  0x11080cf90a04 in check_hba (port=0x110abc279800) at hba.c:1718
  #5  0x11080cf91d34 in hba_getauthmethod (port=0x110abc279800) at
hba.c:2256
  #6  0x11080cf88eb3 in ClientAuthentication (port=0x110abc279800) at
auth.c:307
  #7  0x11080d1edf5d in PerformAuthentication (port=0x110abc279800) at
postinit.c:223
  #8  0x11080d1eeae7 in InitPostgres (in_dbname=0x110ad7782be0
virtstart-dev, dboid=0,
  username=0x110ad7782bc0 virtstart-dev, out_dbname=0x0) at
postinit.c:688
  #9  0x11080d0a3eb1 in PostgresMain (argc=1, argv=0x110ad7782c00,
dbname=0x110ad7782be0 virtstart-dev,
  username=0x110ad7782bc0 virtstart-dev) at postgres.c:3749
  #10 0x11080d033537 in BackendRun (port=Could not find the frame base for
BackendRun.
  ) at postmaster.c:4155
  #11 0x11080d032be8 in BackendStartup (port=0x110abc279800) at
postmaster.c:3829
  #12 0x11080d02f2d0 in ServerLoop () at postmaster.c:1597
  #13 0x11080d02e968 in PostmasterMain (argc=3, argv=0x7f7d9658) at
postmaster.c:1244
  #14 0x11080cf96dc8 in main (argc=Could not find the frame base for
main.
  ) at main.c:228
  Current language:  auto; currently asm

Thanks,

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature which had 
a name of signature.asc]



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-16 Thread Stuart Henderson
worked out with jca and lteo, this fixes this issue (which only occurs
when there's an ipv6 connection) for me.

Index: Makefile
===
RCS file: /cvs/ports/databases/postgresql/Makefile,v
retrieving revision 1.198
diff -u -p -r1.198 Makefile
--- Makefile6 Feb 2015 09:01:21 -   1.198
+++ Makefile16 Feb 2015 22:37:23 -
@@ -10,6 +10,7 @@ COMMENT-plpython=Python procedural langu
 # in case a dump before / restore after pkg_add -u is required!
 
 VERSION=   9.4.1
+REVISION-server= 0
 DISTNAME=  postgresql-${VERSION}
 PKGNAME-main=  postgresql-client-${VERSION}
 PKGNAME-server=postgresql-server-${VERSION}
Index: patches/patch-src_backend_libpq_hba_c
===
RCS file: patches/patch-src_backend_libpq_hba_c
diff -N patches/patch-src_backend_libpq_hba_c
--- /dev/null   1 Jan 1970 00:00:00 -
+++ patches/patch-src_backend_libpq_hba_c   16 Feb 2015 22:37:23 -
@@ -0,0 +1,21 @@
+$OpenBSD$
+
+Fix crash when connecting over IPv6. backwards memcpy logged but it's worse.
+Don't copy the whole space for a sockaddr_storage, at this point in the
+code the addr/mask are known to be a sockaddr_in. Not using sa_len because
+in some cases mask-sa_len is too short (suspect this may be an issue
+related to http://marc.info/?l=openbsd-techm=138089192205849w=2).
+
+--- src/backend/libpq/hba.c.orig   Mon Feb  2 20:42:55 2015
 src/backend/libpq/hba.cMon Feb 16 22:13:26 2015
+@@ -700,8 +700,8 @@ check_ip(SockAddr *raddr, struct sockaddr * addr, stru
+   struct sockaddr_storage addrcopy,
+   maskcopy;
+ 
+-  memcpy(addrcopy, addr, sizeof(addrcopy));
+-  memcpy(maskcopy, mask, sizeof(maskcopy));
++  memcpy(addrcopy, addr, sizeof(struct sockaddr_in));
++  memcpy(maskcopy, mask, sizeof(struct sockaddr_in));
+   pg_promote_v4_to_v6_addr(addrcopy);
+   pg_promote_v4_to_v6_mask(maskcopy);
+ 



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-14 Thread Hugo Osvaldo Barrera
On 2015-02-14 02:28, Abel Abraham Camarillo Ojeda wrote:
 On Sat, Feb 14, 2015 at 2:12 AM, Hugo Osvaldo Barrera h...@barrera.io
wrote:
  On 2015-02-13 13:20, Stuart Henderson wrote:
  On 2015-02-12, Hugo Osvaldo Barrera h...@barrera.io wrote:
   On 2015-02-12 10:18, Stuart Henderson wrote:
   On 2015-02-11, Hugo Osvaldo Barrera h...@barrera.io wrote:
Can
someone else confirm postgres9.4 work fine on the latest -snapshot?
  (the
confirmation would be helpful to reafirm that it's not an issue
with
  some
dependency or library).
  
   Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on
  amd64.
  
  
   Ok, so now I know that the issue is on my end. Which leaves me even
more
   confused. You're running the latest snapshots too, right? (eg: the
ones
  from
   feb 10th?).
  
   Aside from a clean install, do you have any more changes? Perhaps
  login.conf?
 
  I have the login.conf section from the example in the pkg-readme,
 
  postgresql:\
  :openfiles-cur=768:\
  :tc=daemon:
 
  and this in sysctl.conf
 
  # postgresql
  kern.seminfo.semmni=256
  kern.seminfo.semmns=2048
  kern.shminfo.shmmax=50331648
 
  sthen@hutch:~:532$ ls -l /bin/ls /usr/local/bin/postgres
  -r-xr-xr-x  1 root  bin   267968 Feb 10 23:19 /bin/ls*
  -r-xr-xr-x  1 root  bin  6508711 Feb  9 03:21 /usr/local/bin/postgres*
 
  sthen@hutch:~:533$ sysctl kern.version
  kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST
2015
  t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
 
 
  Thanks for all the details. It looks like almost everything is identical
  except our kernels (I had a few extra fields in sysctl.conf edited for
pg,
  but
  reverted them just to make sure they weren't screwing up).
 
# sysctl kern.version
kern.version=OpenBSD 5.7-beta (GENERIC.MP) #852: Tue Feb 10 16:31:16
MST
  2015
t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
 
  I switched to the SP kernel just to discard any possible regressions that
  might
  be affecting this scenario, but no change.
 
  It looks like the issue is elsewhere, but I've no idea where to look. I've
so
  far failed to build postgresql-server with debug symbols enabled too, but
  that's just lack of knowledge on my part.
 
  --
  Hugo Osvaldo Barrera
  A: Because we read from top to bottom, left to right.
  Q: Why should I start my reply below the quoted text?
 
  [demime 1.01d removed an attachment of type application/pgp-signature]
 


 you should give more information about how to reproduce this problem,
 how accurately can you reproduce it, are you sending just a given query
 and it always crashes?


It always crashes extremely frequently. I haven't noticed a pattern, and the
server never lives more than a few senconds. No particular query seems to
trigger it, and adding log_statement showed that it may even crash *before*
any
queries are executed (see below as well).

 you should get more error context, maybe try log_statement into
postgresql.conf
 and try to log all statements and see which one crashes it...

 http://www.postgresql.org/docs/9.4/static/runtime-config-logging.html

 are you using any custom C extension?


Nope, this is a plain default install from snapshots with nothing extra.

 did you dump and restore database ? did you use 'custom format' or
 'plain format' ?

My latest tests reproduce the same issue on a clean out-of-the-box db (eg:
not importing any data).

 there where any errors on import? - postgres just warns about some
 import errors,
 which in my opinion are severe...

This is a log with log_statement and a most logging turned on. I'd only run
the
server *once* post-initialization before this. The database was completely
empty:

http://sprunge.us/UVGj

While a query managed to get through once, the server usually crashed before
that happens.

Here's another, finer-grained log, with nothing useful (apperently) either:

http://sprunge.us/FQaJ

Thanks,

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature]



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-14 Thread Joel Sing
On Saturday 14 February 2015, Hugo Osvaldo Barrera wrote:
 On 2015-02-14 02:28, Abel Abraham Camarillo Ojeda wrote:
  On Sat, Feb 14, 2015 at 2:12 AM, Hugo Osvaldo Barrera h...@barrera.io

 wrote:
   On 2015-02-13 13:20, Stuart Henderson wrote:
   On 2015-02-12, Hugo Osvaldo Barrera h...@barrera.io wrote:
On 2015-02-12 10:18, Stuart Henderson wrote:
On 2015-02-11, Hugo Osvaldo Barrera h...@barrera.io wrote:
 Can
 someone else confirm postgres9.4 work fine on the latest
 -snapshot?
  
   (the
  
 confirmation would be helpful to reafirm that it's not an issue

 with

   some
  
 dependency or library).
   
Works fine on my bacula box, running 9.4.1 (and previously 9.4.0)
on
  
   amd64.
  
Ok, so now I know that the issue is on my end. Which leaves me even

 more

confused. You're running the latest snapshots too, right? (eg: the

 ones

   from
  
feb 10th?).
   
Aside from a clean install, do you have any more changes? Perhaps
  
   login.conf?
  
   I have the login.conf section from the example in the pkg-readme,
  
   postgresql:\
  
   :openfiles-cur=768:\
   :tc=daemon:
  
   and this in sysctl.conf
  
   # postgresql
   kern.seminfo.semmni=256
   kern.seminfo.semmns=2048
   kern.shminfo.shmmax=50331648
  
   sthen@hutch:~:532$ ls -l /bin/ls /usr/local/bin/postgres
   -r-xr-xr-x  1 root  bin   267968 Feb 10 23:19 /bin/ls*
   -r-xr-xr-x  1 root  bin  6508711 Feb  9 03:21 /usr/local/bin/postgres*
  
   sthen@hutch:~:533$ sysctl kern.version
   kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST

 2015

   t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
  
   Thanks for all the details. It looks like almost everything is
   identical except our kernels (I had a few extra fields in sysctl.conf
   edited for

 pg,

   but
   reverted them just to make sure they weren't screwing up).
  
 # sysctl kern.version
 kern.version=OpenBSD 5.7-beta (GENERIC.MP) #852: Tue Feb 10 16:31:16

 MST

   2015
 t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
  
   I switched to the SP kernel just to discard any possible regressions
   that might
   be affecting this scenario, but no change.
  
   It looks like the issue is elsewhere, but I've no idea where to look.
   I've

 so

   far failed to build postgresql-server with debug symbols enabled too,
   but that's just lack of knowledge on my part.
  
   --
   Hugo Osvaldo Barrera
   A: Because we read from top to bottom, left to right.
   Q: Why should I start my reply below the quoted text?
  
   [demime 1.01d removed an attachment of type application/pgp-signature]
 
  you should give more information about how to reproduce this problem,
  how accurately can you reproduce it, are you sending just a given query
  and it always crashes?

 It always crashes extremely frequently. I haven't noticed a pattern, and
 the server never lives more than a few senconds. No particular query seems
 to trigger it, and adding log_statement showed that it may even crash
 *before* any
 queries are executed (see below as well).

  you should get more error context, maybe try log_statement into

 postgresql.conf

  and try to log all statements and see which one crashes it...
 
  http://www.postgresql.org/docs/9.4/static/runtime-config-logging.html
 
  are you using any custom C extension?

 Nope, this is a plain default install from snapshots with nothing extra.

  did you dump and restore database ? did you use 'custom format' or
  'plain format' ?

 My latest tests reproduce the same issue on a clean out-of-the-box db
 (eg: not importing any data).

  there where any errors on import? - postgres just warns about some
  import errors,
  which in my opinion are severe...

 This is a log with log_statement and a most logging turned on. I'd only run
 the
 server *once* post-initialization before this. The database was completely
 empty:

 http://sprunge.us/UVGj

 While a query managed to get through once, the server usually crashed
 before that happens.

The interesting/useful part is:

LOG:  statement: SELECT ... ORDER BY c.oid
LOG:  server process (PID 11531) was terminated by signal 6: Abort trap

So the server process is being sent a SIGABRT, which is causing it to 
terminate. There is a good chance this this is coming from the stack 
protector, which sends a SIGABRT if the stack is smashed.

Is there anything in dmesg or syslog that correlates?

Failing that your next step is likely to run it under gdb and get a backtrace 
from the point where the SIGABRT occurs. You can also bisect by rolling back 
to an older snapshot to see if you can locate the change that has triggered 
the issue.

 Here's another, finer-grained log, with nothing useful (apperently) either:

 http://sprunge.us/FQaJ

 Thanks,

 --
 Hugo Osvaldo Barrera
 A: Because we read from top to bottom, left to right.
 Q: Why should I start my reply below the quoted text?

 [demime 1.01d 

Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-14 Thread Stuart Henderson
On 2015-02-14, Joel Sing j...@sing.id.au wrote:
 The interesting/useful part is:

 LOG:  statement: SELECT ... ORDER BY c.oid
 LOG:  server process (PID 11531) was terminated by signal 6: Abort trap

 So the server process is being sent a SIGABRT, which is causing it to 
 terminate. There is a good chance this this is coming from the stack 
 protector, which sends a SIGABRT if the stack is smashed.

Oh, good call. It could also be a backwards memcpy which would show
up in /var/log/messages (assuming usual config).

If it were another program, our strict mutex checks can also cause
SIGABRT, but that won't apply to pgsql as it's not threaded.



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-14 Thread Hugo Osvaldo Barrera
On 2015-02-13 13:20, Stuart Henderson wrote:
 On 2015-02-12, Hugo Osvaldo Barrera h...@barrera.io wrote:
  On 2015-02-12 10:18, Stuart Henderson wrote:
  On 2015-02-11, Hugo Osvaldo Barrera h...@barrera.io wrote:
   Can
   someone else confirm postgres9.4 work fine on the latest -snapshot?
(the
   confirmation would be helpful to reafirm that it's not an issue with
some
   dependency or library).
 
  Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on
amd64.
 
 
  Ok, so now I know that the issue is on my end. Which leaves me even more
  confused. You're running the latest snapshots too, right? (eg: the ones
from
  feb 10th?).
 
  Aside from a clean install, do you have any more changes? Perhaps
login.conf?

 I have the login.conf section from the example in the pkg-readme,

 postgresql:\
 :openfiles-cur=768:\
 :tc=daemon:

 and this in sysctl.conf

 # postgresql
 kern.seminfo.semmni=256
 kern.seminfo.semmns=2048
 kern.shminfo.shmmax=50331648

 sthen@hutch:~:532$ ls -l /bin/ls /usr/local/bin/postgres
 -r-xr-xr-x  1 root  bin   267968 Feb 10 23:19 /bin/ls*
 -r-xr-xr-x  1 root  bin  6508711 Feb  9 03:21 /usr/local/bin/postgres*

 sthen@hutch:~:533$ sysctl kern.version
 kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST 2015
 t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC


Thanks for all the details. It looks like almost everything is identical
except our kernels (I had a few extra fields in sysctl.conf edited for pg,
but
reverted them just to make sure they weren't screwing up).

  # sysctl kern.version
  kern.version=OpenBSD 5.7-beta (GENERIC.MP) #852: Tue Feb 10 16:31:16 MST
2015
  t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

I switched to the SP kernel just to discard any possible regressions that
might
be affecting this scenario, but no change.

It looks like the issue is elsewhere, but I've no idea where to look. I've so
far failed to build postgresql-server with debug symbols enabled too, but
that's just lack of knowledge on my part.

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature]



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-14 Thread Abel Abraham Camarillo Ojeda
On Sat, Feb 14, 2015 at 2:12 AM, Hugo Osvaldo Barrera h...@barrera.io wrote:
 On 2015-02-13 13:20, Stuart Henderson wrote:
 On 2015-02-12, Hugo Osvaldo Barrera h...@barrera.io wrote:
  On 2015-02-12 10:18, Stuart Henderson wrote:
  On 2015-02-11, Hugo Osvaldo Barrera h...@barrera.io wrote:
   Can
   someone else confirm postgres9.4 work fine on the latest -snapshot?
 (the
   confirmation would be helpful to reafirm that it's not an issue with
 some
   dependency or library).
 
  Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on
 amd64.
 
 
  Ok, so now I know that the issue is on my end. Which leaves me even more
  confused. You're running the latest snapshots too, right? (eg: the ones
 from
  feb 10th?).
 
  Aside from a clean install, do you have any more changes? Perhaps
 login.conf?

 I have the login.conf section from the example in the pkg-readme,

 postgresql:\
 :openfiles-cur=768:\
 :tc=daemon:

 and this in sysctl.conf

 # postgresql
 kern.seminfo.semmni=256
 kern.seminfo.semmns=2048
 kern.shminfo.shmmax=50331648

 sthen@hutch:~:532$ ls -l /bin/ls /usr/local/bin/postgres
 -r-xr-xr-x  1 root  bin   267968 Feb 10 23:19 /bin/ls*
 -r-xr-xr-x  1 root  bin  6508711 Feb  9 03:21 /usr/local/bin/postgres*

 sthen@hutch:~:533$ sysctl kern.version
 kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST 2015
 t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC


 Thanks for all the details. It looks like almost everything is identical
 except our kernels (I had a few extra fields in sysctl.conf edited for pg,
 but
 reverted them just to make sure they weren't screwing up).

   # sysctl kern.version
   kern.version=OpenBSD 5.7-beta (GENERIC.MP) #852: Tue Feb 10 16:31:16 MST
 2015
   t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

 I switched to the SP kernel just to discard any possible regressions that
 might
 be affecting this scenario, but no change.

 It looks like the issue is elsewhere, but I've no idea where to look. I've so
 far failed to build postgresql-server with debug symbols enabled too, but
 that's just lack of knowledge on my part.

 --
 Hugo Osvaldo Barrera
 A: Because we read from top to bottom, left to right.
 Q: Why should I start my reply below the quoted text?

 [demime 1.01d removed an attachment of type application/pgp-signature]



you should give more information about how to reproduce this problem,
how accurately can you reproduce it, are you sending just a given query
and it always crashes?

you should get more error context, maybe try log_statement into postgresql.conf
and try to log all statements and see which one crashes it...

http://www.postgresql.org/docs/9.4/static/runtime-config-logging.html

are you using any custom C extension?

did you dump and restore database ? did you use 'custom format' or
'plain format' ?
there where any errors on import? - postgres just warns about some
import errors,
which in my opinion are severe...



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-14 Thread Hugo Osvaldo Barrera
On 2015-02-14 13:29, Stuart Henderson wrote:
 On 2015-02-14, Joel Sing j...@sing.id.au wrote:
  The interesting/useful part is:
 
  LOG:  statement: SELECT ... ORDER BY c.oid
  LOG:  server process (PID 11531) was terminated by signal 6: Abort trap
 
  So the server process is being sent a SIGABRT, which is causing it to
  terminate. There is a good chance this this is coming from the stack
  protector, which sends a SIGABRT if the stack is smashed.

 Oh, good call. It could also be a backwards memcpy which would show
 up in /var/log/messages (assuming usual config).


Yup, backward memcpy it is (from /var/log/messages):

Feb 14 12:27:34 elysion postgres: backwards memcpy
Feb 14 12:28:10 elysion last message repeated 8 times
Feb 14 12:30:19 elysion last message repeated 28 times
Feb 14 12:40:28 elysion last message repeated 128 times
Feb 14 12:50:40 elysion last message repeated 128 times
Feb 14 13:00:41 elysion last message repeated 126 times
Feb 14 13:10:42 elysion last message repeated 128 times
Feb 14 13:20:49 elysion last message repeated 126 times
Feb 14 13:30:55 elysion last message repeated 128 times
Feb 14 13:41:06 elysion last message repeated 132 times
Feb 14 13:51:10 elysion last message repeated 128 times
Feb 14 14:01:18 elysion last message repeated 128 times
Feb 14 14:08:18 elysion last message repeated 91 times

Am I mistaken in understanding that this is an issue with postgresql itself,
and not a local configuration error?

I tried building postgres with debug symbols (I added the flags described
here[1] to the ports Makefile), but the backtrace is still useless:

# sudo -u _postgresql gdb -q -c postgres.core /usr/local/bin/postgres
Core was generated by `postgres'.
Program terminated with signal 6, Aborted.
Loaded symbols for /usr/local/bin/postgres
#0  0x0bd73424292a in ?? ()
(gdb) bt
#0  0x0bd73424292a in ?? ()
#1  0x in ?? ()

Do I need any further OpenBSD-specific changes to get a useful backtrace?
(I've
to admit that I'm too familiar with debuging with gdb on any platform).

Thanks for all the feedback so far!

[1]:
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQ
L_backend_on_Linux/BSD#Debugging_the_core_dump_-_example

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature which had 
a name of signature.asc]



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-13 Thread Stuart Henderson
On 2015-02-12, Hugo Osvaldo Barrera h...@barrera.io wrote:
 On 2015-02-12 10:18, Stuart Henderson wrote:
 On 2015-02-11, Hugo Osvaldo Barrera h...@barrera.io wrote:
  Can
  someone else confirm postgres9.4 work fine on the latest -snapshot? (the
  confirmation would be helpful to reafirm that it's not an issue with some
  dependency or library).

 Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on amd64.


 Ok, so now I know that the issue is on my end. Which leaves me even more
 confused. You're running the latest snapshots too, right? (eg: the ones from
 feb 10th?).

 Aside from a clean install, do you have any more changes? Perhaps login.conf?

I have the login.conf section from the example in the pkg-readme,

postgresql:\
:openfiles-cur=768:\
:tc=daemon:

and this in sysctl.conf

# postgresql
kern.seminfo.semmni=256
kern.seminfo.semmns=2048
kern.shminfo.shmmax=50331648

sthen@hutch:~:532$ ls -l /bin/ls /usr/local/bin/postgres 
-r-xr-xr-x  1 root  bin   267968 Feb 10 23:19 /bin/ls*
-r-xr-xr-x  1 root  bin  6508711 Feb  9 03:21 /usr/local/bin/postgres*

sthen@hutch:~:533$ sysctl kern.version
kern.version=OpenBSD 5.7-beta (GENERIC) #797: Tue Feb 10 16:26:12 MST 2015
t...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-12 Thread Stuart Henderson
On 2015-02-11, Hugo Osvaldo Barrera h...@barrera.io wrote:
 Can
 someone else confirm postgres9.4 work fine on the latest -snapshot? (the
 confirmation would be helpful to reafirm that it's not an issue with some
 dependency or library).

Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on amd64.



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-12 Thread Hugo Osvaldo Barrera
On 2015-02-12 10:18, Stuart Henderson wrote:
 On 2015-02-11, Hugo Osvaldo Barrera h...@barrera.io wrote:
  Can
  someone else confirm postgres9.4 work fine on the latest -snapshot? (the
  confirmation would be helpful to reafirm that it's not an issue with some
  dependency or library).

 Works fine on my bacula box, running 9.4.1 (and previously 9.4.0) on amd64.


Ok, so now I know that the issue is on my end. Which leaves me even more
confused. You're running the latest snapshots too, right? (eg: the ones from
feb 10th?).

Aside from a clean install, do you have any more changes? Perhaps login.conf?

Thanks,

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature]



Re: postgresql-server exiting abnormally after upgrade to -snapshot

2015-02-11 Thread Hugo Osvaldo Barrera
On 2015-02-11 19:54, Jan Stary wrote:
 On Feb 11 14:49:17, h...@barrera.io wrote:
  Hi,
 
  I upgraded to -snapshot today, and did all the proper postgresql upgrade:
  pg_dump, moved the old db out the the way, re-init'd, started, and
import.
 
  The thing is, upon receiving connections, postgres dies horribly. The log
is
  just this following iterating over and over:
 
WARNING:  terminating connection because of crash of another server
process
DETAIL:  The postmaster has commanded this server process to roll back
the
current transaction and exit, because another server process exited
  abnormally
and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
  repeat
your command.
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted; last known up at 2015-02-11
17:01:00
  GMT
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  record with zero length at 0/1696370
LOG:  redo is not required
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
LOG:  server process (PID 9444) was terminated by signal 6: Abort trap
LOG:  terminating any other active server processes
 
  After much frustration (even building -current), I deleted all of it,
  uninstall, built 9.3.4 using the old ports recipe, installed - same
issue!
 
  It's clearly not an upgrade issue, because deleting all the data files
and
  going back to 9.3 has the same issue.

 Have you stopped the DB server before performing the upgrade?
 Are you sure (pgrep -fl post) that there is no other server process
 around?

   Jan


Yes, I did. I also did this when installing the version I built from ports
(which I also tried with no change).

I actually did the entire process a few times, with -snapshots, -current and
installing from packages.

All exhibited the same behaviour, so I'm starting to suspect the issue is not
postgres per se.

  Has anyone else has this issue, or similar issues with
-snapshot/-current?
  Can
  someone else confirm postgres9.4 work fine on the latest -snapshot? (the
  confirmation would be helpful to reafirm that it's not an issue with some
  dependency or library).
 
  Thanks,
 
  --
  Hugo Osvaldo Barrera
  A: Because we read from top to bottom, left to right.
  Q: Why should I start my reply below the quoted text?
 
  [demime 1.01d removed an attachment of type application/pgp-signature]

--
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

[demime 1.01d removed an attachment of type application/pgp-signature]