Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-19 Thread Christoph Berg
Re: Tom Lane 2014-05-18 26862.1400449...@sss.pgh.pa.us OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about the available stack depth. I'd classify that as a kernel bug. I wonder if it's a different manifestation of this issue:

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-19 Thread Christoph Berg
Re: To Tom Lane 2014-05-19 20140519091808.ga7...@msgid.df7cb.de Re: Tom Lane 2014-05-18 26862.1400449...@sss.pgh.pa.us OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about the available stack depth. I'd classify that as a kernel bug. I wonder if it's a different

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-19 Thread Andres Freund
On 2014-05-19 13:53:18 +0200, Christoph Berg wrote: I've done some more digging. The problem exists also on plain 32bit kernels, not only 64bit running a 32bit userland. (Tested on Debian Wheezy's 3.2.57 kernel.) Too bad. Debian/Ubuntu have been using hardened PostgreSQL builds for years

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-19 Thread Andres Freund
Hi, On 2014-05-19 13:53:18 +0200, Christoph Berg wrote: * PostgreSQL allocates lots of heap using brk() instead of mmap() It doesn't really do that, btw. It's the libc's mmap that makes those decisions, not postgres. Greetings, Andres Freund -- Andres Freund

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-19 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: Isn't the far more obvious thing ot just not build postgres with -pie on 32bit? It's hardly a security benefit if it allows plain user to crash the server. Yeah, that's what I was doing when I was at Red Hat --- PIE mode would be nice, but not when

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-19 Thread Andres Freund
On 2014-05-19 09:53:11 -0400, Tom Lane wrote: I think throwing an error out of a SIGBUS handler is right out. There would be no way to know exactly what code we were interrupting. It's the same reason we don't let, eg, the SIGALRM handler throw a timeout error directly (in most places

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-19 Thread Christoph Berg
Re: Andres Freund 2014-05-19 20140519141221.gc5...@alap3.anarazel.de On 2014-05-19 09:53:11 -0400, Tom Lane wrote: I think throwing an error out of a SIGBUS handler is right out. There would be no way to know exactly what code we were interrupting. It's the same reason we don't let, eg,

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-19 Thread Christoph Berg
Re: To Tom Lane 2014-05-19 20140519144717.gg7...@msgid.df7cb.de Disabling -pie for all 32bit archs seems to be the way to go for us now. FTR, I've just had a look at armhf (arm-linux-gnueabihf), the address layout looks exactly the same there, and 9.3 crashes easily, so it's really a problem of

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-19 Thread Tom Lane
Christoph Berg c...@df7cb.de writes: FTR, I've just had a look at armhf (arm-linux-gnueabihf), the address layout looks exactly the same there, and 9.3 crashes easily, so it's really a problem of all Linux 32bit archs. I'm puzzled the regression tests passed there [1], but anyway, we'll

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-18 Thread Christoph Berg
Re: Tom Lane 2014-05-18 9058.1400385...@sss.pgh.pa.us Christoph Berg c...@df7cb.de writes: Re: Tom Lane 2014-05-14 1357.1400028...@sss.pgh.pa.us It would appear that something is wrong with check_stack_depth(), and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack.

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-18 Thread Andres Freund
On 2014-05-18 11:08:34 +0200, Christoph Berg wrote: Interestingly, the Debian buildd managed to run the testsuite for i386, while I could reproduce the problem on the pgapt build machine and on my notebook, so there must be some system difference. Possibly the reason is these two

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-18 Thread Christoph Berg
Re: Andres Freund 2014-05-18 20140518091445.gu23...@alap3.anarazel.de Did you measure how large the stack actually was when you got the SIGBUS? Should be possible to determine that by computing the offset using some local stack variable in one of the depeest stack frames. Looking at

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-18 Thread Tom Lane
Christoph Berg c...@df7cb.de writes: Re: Andres Freund 2014-05-18 20140518091445.gu23...@alap3.anarazel.de Did you measure how large the stack actually was when you got the SIGBUS? Should be possible to determine that by computing the offset using some local stack variable in one of the

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-18 Thread Andres Freund
On 2014-05-18 17:41:17 -0400, Tom Lane wrote: Christoph Berg c...@df7cb.de writes: Re: Andres Freund 2014-05-18 20140518091445.gu23...@alap3.anarazel.de Did you measure how large the stack actually was when you got the SIGBUS? Should be possible to determine that by computing the offset

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-18 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2014-05-18 17:41:17 -0400, Tom Lane wrote: OK, so the problem is that getrlimit(RLIMIT_STACK) is lying to us about the available stack depth. I'd classify that as a kernel bug. I wonder if it's a different manifestation of this issue:

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-18 Thread Andres Freund
On 2014-05-18 23:52:32 +0200, Andres Freund wrote: On 2014-05-18 17:41:17 -0400, Tom Lane wrote: Christoph Berg c...@df7cb.de writes: Re: Andres Freund 2014-05-18 20140518091445.gu23...@alap3.anarazel.de Did you measure how large the stack actually was when you got the SIGBUS? Should

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-18 Thread Andres Freund
On 2014-05-18 17:56:48 -0400, Tom Lane wrote: The bad news is that the kernel guys have been ignoring the issue for over a year. Dunno if some pressure from the Debian camp would help raise their priority for this. I guess we should forward the bug to the lkml/linux-mm lists. I think a fair

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-17 Thread Christoph Berg
Re: Tom Lane 2014-05-14 1357.1400028...@sss.pgh.pa.us Christoph Berg c...@df7cb.de writes: Building 9.4 beta1 on Debian sid/i386 fails during the regression tests. amd64 works fine, as does i386 on the released distributions. It would appear that something is wrong with

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-17 Thread Tom Lane
Christoph Berg c...@df7cb.de writes: Re: Tom Lane 2014-05-14 1357.1400028...@sss.pgh.pa.us It would appear that something is wrong with check_stack_depth(), and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack. ulimit -s is 8192 (kB); max_stack_depth is 2MB.

[HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-13 Thread Christoph Berg
Building 9.4 beta1 on Debian sid/i386 fails during the regression tests. amd64 works fine, as does i386 on the released distributions. parallel group (11 tests): create_cast create_aggregate drop_if_exists typed_table create_function_3 vacuum constraints create_table_like triggers inherit

Re: [HACKERS] 9.4 beta1 crash on Debian sid/i386

2014-05-13 Thread Tom Lane
Christoph Berg c...@df7cb.de writes: Building 9.4 beta1 on Debian sid/i386 fails during the regression tests. amd64 works fine, as does i386 on the released distributions. It would appear that something is wrong with check_stack_depth(), and/or getrlimit(RLIMIT_STACK) is lying to us about the