RE: possible crashes on linux with recent glibc
Hi Willy, Your description corresponds with my configuration (using select() with glibc 2.15 on ubuntu crashing with some load). On the terminal I see (which is what confuses a bit): *** buffer overflow detected ***: ./haproxy terminated and the backtrace looks like this: (gdb) backtrace full #0 0xb76e2424 in __kernel_vsyscall () No symbol table info available. #1 0xb755b1df in raise () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #2 0xb755e825 in abort () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #3 0xb759839a in ?? () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #4 0xb76310e5 in __fortify_fail () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #5 0xb762feba in __chk_fail () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #6 0xb763107a in __fdelt_warn () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #7 0x0809ad3f in _do_poll (p=0x80ce0e0, exp=-1820950388) at src/ev_select.c:65 I'm quite sure its exactly this problem, but I prefer to double check with you. Yes it was the exact same trace I used to get when using select() with too large file descriptors. I really think that this glibc change will break a large number of software... Thanks. Yes, indeed. I wonder why it doesn't crash without compiler optimization (-O0) though. Anyway, thanks for confirming the backtrace. Regards, Lukas
Re: possible crashes on linux with recent glibc
Hi Lukas, On Thu, Mar 06, 2014 at 09:54:44AM +0100, Lukas Tribus wrote: Hi Willy, Your description corresponds with my configuration (using select() with glibc 2.15 on ubuntu crashing with some load). On the terminal I see (which is what confuses a bit): *** buffer overflow detected ***: ./haproxy terminated and the backtrace looks like this: (gdb) backtrace full #0 0xb76e2424 in __kernel_vsyscall () No symbol table info available. #1 0xb755b1df in raise () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #2 0xb755e825 in abort () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #3 0xb759839a in ?? () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #4 0xb76310e5 in __fortify_fail () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #5 0xb762feba in __chk_fail () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #6 0xb763107a in __fdelt_warn () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #7 0x0809ad3f in _do_poll (p=0x80ce0e0, exp=-1820950388) at src/ev_select.c:65 I'm quite sure its exactly this problem, but I prefer to double check with you. Yes it was the exact same trace I used to get when using select() with too large file descriptors. I really think that this glibc change will break a large number of software... Thanks. Yes, indeed. I wonder why it doesn't crash without compiler optimization (-O0) though. I suspect that the FD_SET macros might be declared as functions instead of macros and that they check the parameter before dereferencing the array. That's just a guess. Willy
RE: possible crashes on linux with recent glibc
Hi Willy, Chris Allen and Jeff Zellner reported a similar issue at the same time on two different versions : 1.4.20 and 1.5-dev17. The symptom is always the same, haproxy suddenly started to crash under load while it did not in the past. When looking deeper into the traces and core files, it happens that both versions were built with TARGET=generic, so haproxy was using select() to poll for new events. The issue was tracked down to a recent update to glibc which now verifies that the file descriptor number passed to FD_SET/FD_CLR/ FD_ISSET is comprised between 0 and FD_SETSIZE-1 (1023) : http://repo.or.cz/w/glibc.git/commitdiff/a0f33f996 I believe it was merged into glibc 2.16 and backported in the glibc 2.15 as shipped with Ubuntu 12.04. Sorry to wakeup this one year old thread, I just hit a crash while playing with older code and want to confirm that its 'only' this particular (known) problem I'm hitting, not a hidden bug (or whatever). Your description corresponds with my configuration (using select() with glibc 2.15 on ubuntu crashing with some load). On the terminal I see (which is what confuses a bit): *** buffer overflow detected ***: ./haproxy terminated and the backtrace looks like this: (gdb) backtrace full #0 0xb76e2424 in __kernel_vsyscall () No symbol table info available. #1 0xb755b1df in raise () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #2 0xb755e825 in abort () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #3 0xb759839a in ?? () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #4 0xb76310e5 in __fortify_fail () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #5 0xb762feba in __chk_fail () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #6 0xb763107a in __fdelt_warn () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #7 0x0809ad3f in _do_poll (p=0x80ce0e0, exp=-1820950388) at src/ev_select.c:65 I'm quite sure its exactly this problem, but I prefer to double check with you. Thanks, Lukas
Re: possible crashes on linux with recent glibc
Hi Lukas, On Wed, Mar 05, 2014 at 07:38:42PM +0100, Lukas Tribus wrote: Hi Willy, Chris Allen and Jeff Zellner reported a similar issue at the same time on two different versions : 1.4.20 and 1.5-dev17. The symptom is always the same, haproxy suddenly started to crash under load while it did not in the past. When looking deeper into the traces and core files, it happens that both versions were built with TARGET=generic, so haproxy was using select() to poll for new events. The issue was tracked down to a recent update to glibc which now verifies that the file descriptor number passed to FD_SET/FD_CLR/ FD_ISSET is comprised between 0 and FD_SETSIZE-1 (1023) : http://repo.or.cz/w/glibc.git/commitdiff/a0f33f996 I believe it was merged into glibc 2.16 and backported in the glibc 2.15 as shipped with Ubuntu 12.04. Sorry to wakeup this one year old thread, I just hit a crash while playing with older code and want to confirm that its 'only' this particular (known) problem I'm hitting, not a hidden bug (or whatever). Your description corresponds with my configuration (using select() with glibc 2.15 on ubuntu crashing with some load). On the terminal I see (which is what confuses a bit): *** buffer overflow detected ***: ./haproxy terminated and the backtrace looks like this: (gdb) backtrace full #0 0xb76e2424 in __kernel_vsyscall () No symbol table info available. #1 0xb755b1df in raise () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #2 0xb755e825 in abort () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #3 0xb759839a in ?? () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #4 0xb76310e5 in __fortify_fail () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #5 0xb762feba in __chk_fail () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #6 0xb763107a in __fdelt_warn () from /lib/i386-linux-gnu/libc.so.6 No symbol table info available. #7 0x0809ad3f in _do_poll (p=0x80ce0e0, exp=-1820950388) at src/ev_select.c:65 I'm quite sure its exactly this problem, but I prefer to double check with you. Yes it was the exact same trace I used to get when using select() with too large file descriptors. I really think that this glibc change will break a large number of software... Regards, Willy
Re: possible crashes on linux with recent glibc
Hi Brian, On Mon, Apr 01, 2013 at 07:11:25PM -0700, Bryan Talbot wrote: haproxy built with macports on OSX seems to only have support for select() and not poll(). I don't have any suggestions but is this environment impacted by your proposed changes? It's a Makefile issue, OSX supports select(), poll() and kqueue(). And BTW, OSX is one of those causing issues with select() and fd = 1024 according to the man page. In fact all operating systems where haproxy may be built support poll(). Not running haproxy on osx for anything other than localhost development mode of course, but keeping it working on osx would be great. You're right, I'm going to fix the makefile right now. $ /opt/local/sbin/haproxy -vv HA-Proxy version 1.4.22 2012/08/09 Copyright 2000-2012 Willy Tarreau w...@1wt.eu Build options : TARGET = osx ^^^ The issue is here above. The osx target is not defined so no option is taken. So first I'll define such a target because it makes sense to have it, and second, I'll enable POLL by default when the target is unknown. And this way you'll get a better development platform :-) Thanks, Willy
Re: possible crashes on linux with recent glibc
On Fri, Mar 29, 2013 at 11:01 AM, Willy Tarreau w...@1wt.eu wrote: Hi, For the medium term, I'm going to prepare the following changes : - make poll() rely solely on bit fields without using FD_* macros - add a start up warning when select() is used with a maxconn leading to more than FD_SETSIZE fds, followed by a runtime test to make it crash in glibc while parsing the config if needed instead of reserving a friday evening surprize for you. - enable poll() by default in the generic target, as it's supported on all platforms where haproxy is known to build haproxy built with macports on OSX seems to only have support for select() and not poll(). I don't have any suggestions but is this environment impacted by your proposed changes? Not running haproxy on osx for anything other than localhost development mode of course, but keeping it working on osx would be great. $ /opt/local/sbin/haproxy -vv HA-Proxy version 1.4.22 2012/08/09 Copyright 2000-2012 Willy Tarreau w...@1wt.eu Build options : TARGET = osx CPU = generic CC = /usr/bin/clang -arch x86_64 CFLAGS = -O2 -g -fno-strict-aliasing OPTIONS = USE_LIBCRYPT=1 USE_REGPARM=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200 Encrypted password support via crypt(3): yes Available polling systems : select : pref=150, test result OK Total: 1 (1 usable), will use select.