Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions
Hi guys, Sorry for late response. I've just tested current git master. It seems the issue is gone now. Wily, thank you for looking into the issue so close. Anticipating latest 1.6.x in official Debian/Ubuntu packages soon. Thanks, Andrey On Tue, Jun 21, 2016 at 7:50 PM, Willy Tarreau wrote: > Thus I backported all of this to 1.6 planning for 1.6.6. It would be nice > if the people affected with issues could give it a try this week (either > from git or just wait for tomorrow morning to get the latest snapshot).
Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions
Yes I noticed there were more issues with the FD's. Thanks for all of your work, I will test 1.6.6 as soon as it hits vbernat's PPA. Best, Luke smime.p7s Description: S/MIME Cryptographic Signature
Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions
Hi guys, I'm replying to this thread and adding Andrey and Veiko in Cc to keep everyone up to date. On Wed, Jun 08, 2016 at 10:17:19AM +0900, Simon Horman wrote: > Lukas, could you try this? > > diff --git a/src/checks.c b/src/checks.c > index c4ac947b6051..e65d28f7c3c6 100644 > --- a/src/checks.c > +++ b/src/checks.c > @@ -1836,6 +1836,12 @@ static int connect_proc_chk(struct task *t) > if (pid == 0) { > /* Child */ > extern char **environ; > + > + if ((global.mode & MODE_QUIET) && !(global.mode & > MODE_VERBOSE)) { > + close(0); > + close(1); > + } > + > environ = check->envp; > extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, > ultoa_r(s->cur_sess, buf, sizeof(buf))); > execvp(px->check_command, check->argv); So since this patch we got Andrey's report and figured other issues caused by sharing FDs between the processes (namely the fact that automatic FD deletion doesn't happen in epoll until the child closes it as well, possibly causing polling loops or fake event reports). Thus I went down the safe route for now, by closing *all* FDs. This is fast enough, on my machine I can close around 17 million FDs per second. However I liked Simon's approach above consisting in keeping stdin/stdout/ stderr open when in verbose mode. That's very handy to debug. While testing this change, I discovered a very interesting issue. My process would randomly segfault after a delay ranging from 1 to 10 seconds. The core files didn't make any sense, proving that the memory was corrupted. In the end I found the culprit. The problem is random signal delivery. In the signal handler, a call to task_wakeup() is made, and this sometimes destroys the scheduler when it's already in the signal handler, as the rq_next pointer is sometimes nulled then dereferenced, and sometimes the ebtree structure is corrupted. So I changed this to make use of the asynchronous signal delivery and now it's OK since the functions are called after the poll loop. This made me remember that Veiko who used to experience random segfaults with zlib also has external checks configured and also had a non-sensical core file, and zlib has a few orders of magniture higher latency than the rest of the process, that makes it much more likely that a signal is delivered here than anywhere else. Thus I backported all of this to 1.6 planning for 1.6.6. It would be nice if the people affected with issues could give it a try this week (either from git or just wait for tomorrow morning to get the latest snapshot). I also developped a patch to block all signals out of the poll loop. This makes the code cleaner but I realized that it only improves the performance for fork-limited workloads and will decrease it for regular workloads. So for now it remains unmerged before I run more tests. On Linux we can use epoll_pwait() to remove a few syscalls so the impact might be less important. Anyway I suspect that over the longterm we'll go this way. Thanks to all those who helped on this, and do not hesitate to report any remaining issue! Willy
[BUMP] Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions
BUMP The patch looks good to me and should be merged. Best, Luke On 08.06.2016 03:17, Simon Horman wrote: > On Tue, Jun 07, 2016 at 08:18:21PM +0200, Willy Tarreau wrote: >> On Tue, Jun 07, 2016 at 12:01:31PM +0200, Benoit Garnier wrote: >>> You can always open /dev/null before chrooting and dup() it into FD 0 and 1 >>> after chroot() has been called. >> >> I'd be more tempted to simply close those FDs after the fork(). That >> may improve the ability to detect faulty scripts which try to dump >> GBs of data. >> >> A very long time ago I've seen a health check perform an LDAP search >> retrieving all the hundreds of thousands of members of a group, and >> the people in charge for the server were complaining that the health >> checks were hurting the server... Better have the script fail with a >> broken pipe in this case. >> >> Just a suggestion. > > Thanks, I think that is reasonable. I particularly like its simplicity. > > Lukas, could you try this? > > diff --git a/src/checks.c b/src/checks.c > index c4ac947b6051..e65d28f7c3c6 100644 > --- a/src/checks.c > +++ b/src/checks.c > @@ -1836,6 +1836,12 @@ static int connect_proc_chk(struct task *t) > if (pid == 0) { > /* Child */ > extern char **environ; > + > + if ((global.mode & MODE_QUIET) && !(global.mode & > MODE_VERBOSE)) { > + close(0); > + close(1); > + } > + > environ = check->envp; > extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, > ultoa_r(s->cur_sess, buf, sizeof(buf))); > execvp(px->check_command, check->argv); > smime.p7s Description: S/MIME Cryptographic Signature
Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions
Ah, this is fun :D Running this haproxy built from git: > erlacher@vmrbg81:~/haproxy$ sudo ./haproxy -vv > HA-Proxy version 1.7-dev3-4b788f-35 2016/06/08 > Copyright 2000-2016 Willy Tarreau > > Build options : > TARGET = linux2628 > CPU = generic > CC = gcc > CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement > OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1 > > Default settings : > maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 > > Encrypted password support via crypt(3): yes > Built with zlib version : 1.2.8 > Compression algorithms supported : identity("identity"), deflate("deflate"), > raw-deflate("deflate"), gzip("gzip") > Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014 > Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014 > OpenSSL library supports TLS extensions : yes > OpenSSL library supports SNI : yes > OpenSSL library supports prefer-server-ciphers : yes > Built with PCRE version : 8.31 2012-07-06 > PCRE library supports JIT : no (USE_PCRE_JIT not set) > Built without Lua support > Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT > IP_FREEBIND > > Available polling systems : > epoll : pref=300, test result OK >poll : pref=200, test result OK > select : pref=150, test result OK > Total: 3 (3 usable), will use epoll. > > Available filters : > [COMP] compression > [TRACE] trace Running this commandline: > erlacher@vmrbg81:~/haproxy$ sudo ./haproxy -d -f /etc/haproxy/haproxy.cfg > > > Results in the ldapcheck output going out to just regular stdout. So I'll have to run haproxy daemonized to reproduce the problem... > erlacher@vmrbg81:~/haproxy$ sudo ./haproxy -f /etc/haproxy/haproxy.cfg This results in the expected erroneous output on the first client connection. Now to apply the patch and rebuild... Running haproxy with -d still gives the check output on haproxy stdout, which is ok I suppose. Running haproxy daemonized... gives no check output. Good! So from this small test I believe the patch works. Best, Luke On 08.06.2016 03:17, Simon Horman wrote: > On Tue, Jun 07, 2016 at 08:18:21PM +0200, Willy Tarreau wrote: >> On Tue, Jun 07, 2016 at 12:01:31PM +0200, Benoit Garnier wrote: >>> You can always open /dev/null before chrooting and dup() it into FD 0 and 1 >>> after chroot() has been called. >> >> I'd be more tempted to simply close those FDs after the fork(). That >> may improve the ability to detect faulty scripts which try to dump >> GBs of data. >> >> A very long time ago I've seen a health check perform an LDAP search >> retrieving all the hundreds of thousands of members of a group, and >> the people in charge for the server were complaining that the health >> checks were hurting the server... Better have the script fail with a >> broken pipe in this case. >> >> Just a suggestion. > > Thanks, I think that is reasonable. I particularly like its simplicity. > > Lukas, could you try this? > > diff --git a/src/checks.c b/src/checks.c > index c4ac947b6051..e65d28f7c3c6 100644 > --- a/src/checks.c > +++ b/src/checks.c > @@ -1836,6 +1836,12 @@ static int connect_proc_chk(struct task *t) > if (pid == 0) { > /* Child */ > extern char **environ; > + > + if ((global.mode & MODE_QUIET) && !(global.mode & > MODE_VERBOSE)) { > + close(0); > + close(1); > + } > + > environ = check->envp; > extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, > ultoa_r(s->cur_sess, buf, sizeof(buf))); > execvp(px->check_command, check->argv); > -- Mit freundlichen Gruessen, Lukas Erlacher -- Rechnerbetriebsgruppe der Fakultäten Mathematik und Informatik Raum 00.05.042 Tel. 089-289-18258 erlac...@in.tum.de Technische Universität München - Boltzmannstr. 3 - 85748 Garching smime.p7s Description: S/MIME Cryptographic Signature
Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions
On Tue, Jun 07, 2016 at 08:18:21PM +0200, Willy Tarreau wrote: > On Tue, Jun 07, 2016 at 12:01:31PM +0200, Benoit Garnier wrote: > > You can always open /dev/null before chrooting and dup() it into FD 0 and 1 > > after chroot() has been called. > > I'd be more tempted to simply close those FDs after the fork(). That > may improve the ability to detect faulty scripts which try to dump > GBs of data. > > A very long time ago I've seen a health check perform an LDAP search > retrieving all the hundreds of thousands of members of a group, and > the people in charge for the server were complaining that the health > checks were hurting the server... Better have the script fail with a > broken pipe in this case. > > Just a suggestion. Thanks, I think that is reasonable. I particularly like its simplicity. Lukas, could you try this? diff --git a/src/checks.c b/src/checks.c index c4ac947b6051..e65d28f7c3c6 100644 --- a/src/checks.c +++ b/src/checks.c @@ -1836,6 +1836,12 @@ static int connect_proc_chk(struct task *t) if (pid == 0) { /* Child */ extern char **environ; + + if ((global.mode & MODE_QUIET) && !(global.mode & MODE_VERBOSE)) { + close(0); + close(1); + } + environ = check->envp; extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, ultoa_r(s->cur_sess, buf, sizeof(buf))); execvp(px->check_command, check->argv);
Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions
On Tue, Jun 07, 2016 at 12:01:31PM +0200, Benoit Garnier wrote: > You can always open /dev/null before chrooting and dup() it into FD 0 and 1 > after chroot() has been called. I'd be more tempted to simply close those FDs after the fork(). That may improve the ability to detect faulty scripts which try to dump GBs of data. A very long time ago I've seen a health check perform an LDAP search retrieving all the hundreds of thousands of members of a group, and the people in charge for the server were complaining that the health checks were hurting the server... Better have the script fail with a broken pipe in this case. Just a suggestion. Willy
Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions
You can always open /dev/null before chrooting and dup() it into FD 0 and 1 after chroot() has been called. Le 7 juin 2016 06:23:04 GMT+02:00, Simon Horman a écrit : >Hi Cyril, Hi Lukas, > >On Mon, Jun 06, 2016 at 08:21:46PM +0200, Cyril Bonté wrote: >> Hi Lukas, >> >> I add Malcolm and Simon to the thread. >> >> Le 06/06/2016 à 08:36, Lukas Erlacher a écrit : >> >Additional info: The output only ends up in the *first* client >connection. >> > >> >Talking about this with some colleagues we're now theorizing that >stdout goes to fd 1 and fd 1 is also the first client connection >socket. Might be helpful for tracking this down. >> >> After doing a quick test, I can confirm I can reproduce the issue >(once >> working in daemon mode). >> >> Simon, do you have time to work on a fix ? or should someone else do >? (I >> think I'll be available to work on this, but only by the end of the >week) >> >> Also Malcolm, as I remember you are using this feature, be aware that >you >> may hit the issue too. > >looking over the code a little the theory regarding fd 1 seems entirely >plausible as stdout, along with stdin and stdin, may be be closed >in haproxy.c:main() > >It seems to me that if the case where they were closed, and thus fd 1 >and 2 >may be used for other purposes, it would be prudent to redirect fd 1 >and 2 >to "/dev/null". > >One problem I see with this is that if haproxy is running in a chroot >and /dev/null is not present then open() will fail. > >Lukas, would it be possible for you to test the following >(I have only compile tested it) ? > > >diff --git a/src/checks.c b/src/checks.c >index c4ac947b6051..eca7df62522f 100644 >--- a/src/checks.c >+++ b/src/checks.c >@@ -1836,6 +1836,16 @@ static int connect_proc_chk(struct task *t) > if (pid == 0) { > /* Child */ > extern char **environ; >+ >+ if ((global.mode & MODE_QUIET) && !(global.mode & >MODE_VERBOSE)) { >+ close(0); >+ close(1); >+ >+ if (open("/dev/null", 0, O_WRONLY) || open("/dev/null", >0, >O_WRONLY)) { >+ exit (-1); >+ } >+ } >+ > environ = check->envp; > extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, >ultoa_r(s->cur_sess, buf, sizeof(buf))); > execvp(px->check_command, check->argv); -- Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté.
Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions
Hi Cyril, Hi Lukas, On Mon, Jun 06, 2016 at 08:21:46PM +0200, Cyril Bonté wrote: > Hi Lukas, > > I add Malcolm and Simon to the thread. > > Le 06/06/2016 à 08:36, Lukas Erlacher a écrit : > >Additional info: The output only ends up in the *first* client connection. > > > >Talking about this with some colleagues we're now theorizing that stdout > >goes to fd 1 and fd 1 is also the first client connection socket. Might be > >helpful for tracking this down. > > After doing a quick test, I can confirm I can reproduce the issue (once > working in daemon mode). > > Simon, do you have time to work on a fix ? or should someone else do ? (I > think I'll be available to work on this, but only by the end of the week) > > Also Malcolm, as I remember you are using this feature, be aware that you > may hit the issue too. looking over the code a little the theory regarding fd 1 seems entirely plausible as stdout, along with stdin and stdin, may be be closed in haproxy.c:main() It seems to me that if the case where they were closed, and thus fd 1 and 2 may be used for other purposes, it would be prudent to redirect fd 1 and 2 to "/dev/null". One problem I see with this is that if haproxy is running in a chroot and /dev/null is not present then open() will fail. Lukas, would it be possible for you to test the following (I have only compile tested it) ? diff --git a/src/checks.c b/src/checks.c index c4ac947b6051..eca7df62522f 100644 --- a/src/checks.c +++ b/src/checks.c @@ -1836,6 +1836,16 @@ static int connect_proc_chk(struct task *t) if (pid == 0) { /* Child */ extern char **environ; + + if ((global.mode & MODE_QUIET) && !(global.mode & MODE_VERBOSE)) { + close(0); + close(1); + + if (open("/dev/null", 0, O_WRONLY) || open("/dev/null", 0, O_WRONLY)) { + exit (-1); + } + } + environ = check->envp; extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, ultoa_r(s->cur_sess, buf, sizeof(buf))); execvp(px->check_command, check->argv);
Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions
Hi Lukas, I add Malcolm and Simon to the thread. Le 06/06/2016 à 08:36, Lukas Erlacher a écrit : Additional info: The output only ends up in the *first* client connection. Talking about this with some colleagues we're now theorizing that stdout goes to fd 1 and fd 1 is also the first client connection socket. Might be helpful for tracking this down. After doing a quick test, I can confirm I can reproduce the issue (once working in daemon mode). Simon, do you have time to work on a fix ? or should someone else do ? (I think I'll be available to work on this, but only by the end of the week) Also Malcolm, as I remember you are using this feature, be aware that you may hit the issue too. -- Cyril Bonté
Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions
Additional info: The output only ends up in the *first* client connection. Talking about this with some colleagues we're now theorizing that stdout goes to fd 1 and fd 1 is also the first client connection socket. Might be helpful for tracking this down. smime.p7s Description: S/MIME Cryptographic Signature
external-check stdout ends up in load-balanced traffic, destroying tcp sessions
I am absolutely flabbergasted... I have a "simple" transparent proxy application with two ldap backends. Here is my haproxy config: > erlacher@vmrbg80:~$ cat /etc/haproxy/haproxy.cfg > # Managed by saltstack > global > log /dev/loglocal0 > log /dev/loglocal1 notice > # chroot /var/lib/haproxy > # user haproxy > # group haproxy > daemon > external-check > > defaults > log global > modetcp > option tcplog > option dontlognull > timeout connect 5000 > timeout client 1d > timeout server 1d > errorfile 400 /etc/haproxy/errors/400.http > errorfile 403 /etc/haproxy/errors/403.http > errorfile 408 /etc/haproxy/errors/408.http > errorfile 500 /etc/haproxy/errors/500.http > errorfile 502 /etc/haproxy/errors/502.http > errorfile 503 /etc/haproxy/errors/503.http > errorfile 504 /etc/haproxy/errors/504.http > > frontend ft_ldaps > bind ldap-ha-ft:636 transparent > default_backend bk_ldaps > backend bk_ldaps > balance source > source 0.0.0.0 usesrc clientip > option external-check > external-check command /usr/local/bin/ldapcheck.sh > server ldap1 ldap-ha-bk1:636 check > server ldap2 ldap-ha-bk2:636 check > > frontend ft_ldap > bind ldap-ha-ft:389 transparent > default_backend bk_ldap > backend bk_ldap > balance source > source 0.0.0.0 usesrc clientip > option external-check > external-check command /usr/local/bin/ldapcheck.sh > server ldap1 ldap-ha-bk1:389 check > server ldap2 ldap-ha-bk2:389 check Here is the ldapcheck.sh script: > erlacher@vmrbg80:~$ cat /usr/local/bin/ldapcheck.sh > #!/bin/bash > > backend=$3 > > #/usr/bin/ldapsearch -x -L -H "ldap://$backend"; -b ou=blabliblub '(mail=!)' dn > > exit 0 I will neglect to post all of the sysctl, ip rule, ip route etc. etc. settings, they roughly follow the instructions in [1] with my own improvements previously posted to this ML and they work. Everything works in this configuration. HOWEVER, if I delete the "exit 0" and uncomment the ldapsearch call so it's actually a functional health check, the output of the ldapsearch ends up in the load balanced tcp traffic! > erlacher@atrbg13 ~ % telnet ldap-ha-ft 389 > > > ~ > Trying 131.159.255.33... > Connected to ldap-ha-ft. > Escape character is '^]'. > version: 1 > > # > # LDAPv3 > # base with scope subtree > # filter: (mail=!) > # requesting: dn > # > > # search result I can, of course, fix this by redirecting the output of the ldapsearch command to /dev/null, but why the heck does this end up in the tcp traffic anyway? Needless to say, this completely clobbers the actual client connection and it gets closed immediately. Best regards, Luke :D [1] http://blog.haproxy.com/2013/09/16/howto-transparent-proxying-and-binding-with-haproxy-and-aloha-load-balancer/ haproxy -vv: > erlacher@vmrbg80:~$ sudo haproxy -vv > HA-Proxy version 1.6.5 2016/05/10 > Copyright 2000-2016 Willy Tarreau > > Build options : > TARGET = linux2628 > CPU = generic > CC = gcc > CFLAGS = -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 -Wformat > -Werror=format-security -D_FORTIFY_SOURCE=2 > OPTIONS = USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 > > Default settings : > maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 > > Encrypted password support via crypt(3): yes > Built with zlib version : 1.2.8 > Compression algorithms supported : identity("identity"), deflate("deflate"), > raw-deflate("deflate"), gzip("gzip") > Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014 > Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014 > OpenSSL library supports TLS extensions : yes > OpenSSL library supports SNI : yes > OpenSSL library supports prefer-server-ciphers : yes > Built with PCRE version : 8.31 2012-07-06 > PCRE library supports JIT : no (USE_PCRE_JIT not set) > Built with Lua version : Lua 5.3.1 > Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT > IP_FREEBIND > > Available polling systems : > epoll : pref=300, test result OK >poll : pref=200, test result OK > select : pref=150, test result OK > Total: 3 (3 usable), will use epoll. -- Mit freundlichen Gruessen, Lukas Erlacher -- Rechnerbetriebsgruppe der Fakultäten Mathematik und Informatik Raum 00.05.042 Tel. 089-289-18258 erlac...@in.tum.de Technische Universität München - Boltzmannstr. 3 - 85748 Garching smime.p7s Description: S/MIME Cryptographic Signature