Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions

2016-06-26 Thread Andrey Galkin
Hi guys,

Sorry for late response. I've just tested current git master. It seems
the issue is gone now.

Wily, thank you for looking into the issue so close.

Anticipating latest 1.6.x in official Debian/Ubuntu packages soon.

Thanks,
Andrey

On Tue, Jun 21, 2016 at 7:50 PM, Willy Tarreau  wrote:
> Thus I backported all of this to 1.6 planning for 1.6.6. It would be nice
> if the people affected with issues could give it a try this week (either
> from git or just wait for tomorrow morning to get the latest snapshot).



Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions

2016-06-22 Thread Lukas Erlacher
Yes I noticed there were more issues with the FD's. Thanks for all of your 
work, I will test 1.6.6 as soon as it hits vbernat's PPA.

Best,
Luke



smime.p7s
Description: S/MIME Cryptographic Signature


Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions

2016-06-21 Thread Willy Tarreau
Hi guys,

I'm replying to this thread and adding Andrey and Veiko in Cc to keep
everyone up to date.

On Wed, Jun 08, 2016 at 10:17:19AM +0900, Simon Horman wrote:
> Lukas, could you try this?
> 
> diff --git a/src/checks.c b/src/checks.c
> index c4ac947b6051..e65d28f7c3c6 100644
> --- a/src/checks.c
> +++ b/src/checks.c
> @@ -1836,6 +1836,12 @@ static int connect_proc_chk(struct task *t)
>   if (pid == 0) {
>   /* Child */
>   extern char **environ;
> +
> + if ((global.mode & MODE_QUIET) && !(global.mode & 
> MODE_VERBOSE)) {
> + close(0);
> + close(1);
> + }
> +
>   environ = check->envp;
>   extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, 
> ultoa_r(s->cur_sess, buf, sizeof(buf)));
>   execvp(px->check_command, check->argv);

So since this patch we got Andrey's report and figured other issues
caused by sharing FDs between the processes (namely the fact that
automatic FD deletion doesn't happen in epoll until the child closes
it as well, possibly causing polling loops or fake event reports).

Thus I went down the safe route for now, by closing *all* FDs. This is
fast enough, on my machine I can close around 17 million FDs per second.
However I liked Simon's approach above consisting in keeping stdin/stdout/
stderr open when in verbose mode. That's very handy to debug.

While testing this change, I discovered a very interesting issue. My
process would randomly segfault after a delay ranging from 1 to 10
seconds. The core files didn't make any sense, proving that the memory
was corrupted. In the end I found the culprit. The problem is random
signal delivery. In the signal handler, a call to task_wakeup() is
made, and this sometimes destroys the scheduler when it's already in
the signal handler, as the rq_next pointer is sometimes nulled then
dereferenced, and sometimes the ebtree structure is corrupted. So I
changed this to make use of the asynchronous signal delivery and now
it's OK since the functions are called after the poll loop.

This made me remember that Veiko who used to experience random segfaults
with zlib also has external checks configured and also had a non-sensical
core file, and zlib has a few orders of magniture higher latency than the
rest of the process, that makes it much more likely that a signal is
delivered here than anywhere else.

Thus I backported all of this to 1.6 planning for 1.6.6. It would be nice
if the people affected with issues could give it a try this week (either
from git or just wait for tomorrow morning to get the latest snapshot).

I also developped a patch to block all signals out of the poll loop. This
makes the code cleaner but I realized that it only improves the performance
for fork-limited workloads and will decrease it for regular workloads. So
for now it remains unmerged before I run more tests. On Linux we can use
epoll_pwait() to remove a few syscalls so the impact might be less important.
Anyway I suspect that over the longterm we'll go this way.

Thanks to all those who helped on this, and do not hesitate to report
any remaining issue!

Willy




[BUMP] Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions

2016-06-14 Thread Lukas Erlacher
BUMP

The patch looks good to me and should be merged.

Best,
Luke

On 08.06.2016 03:17, Simon Horman wrote:
> On Tue, Jun 07, 2016 at 08:18:21PM +0200, Willy Tarreau wrote:
>> On Tue, Jun 07, 2016 at 12:01:31PM +0200, Benoit Garnier wrote:
>>> You can always open /dev/null before chrooting and dup() it into FD 0 and 1 
>>> after chroot() has been called.
>>
>> I'd be more tempted to simply close those FDs after the fork(). That
>> may improve the ability to detect faulty scripts which try to dump
>> GBs of data.
>>
>> A very long time ago I've seen a health check perform an LDAP search
>> retrieving all the hundreds of thousands of members of a group, and
>> the people in charge for the server were complaining that the health
>> checks were hurting the server... Better have the script fail with a
>> broken pipe in this case.
>>
>> Just a suggestion.
> 
> Thanks, I think that is reasonable. I particularly like its simplicity.
> 
> Lukas, could you try this?
> 
> diff --git a/src/checks.c b/src/checks.c
> index c4ac947b6051..e65d28f7c3c6 100644
> --- a/src/checks.c
> +++ b/src/checks.c
> @@ -1836,6 +1836,12 @@ static int connect_proc_chk(struct task *t)
>   if (pid == 0) {
>   /* Child */
>   extern char **environ;
> +
> + if ((global.mode & MODE_QUIET) && !(global.mode & 
> MODE_VERBOSE)) {
> + close(0);
> + close(1);
> + }
> +
>   environ = check->envp;
>   extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, 
> ultoa_r(s->cur_sess, buf, sizeof(buf)));
>   execvp(px->check_command, check->argv);
> 



smime.p7s
Description: S/MIME Cryptographic Signature


Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions

2016-06-09 Thread Lukas Erlacher
Ah, this is fun :D

Running this haproxy built from git:
> erlacher@vmrbg81:~/haproxy$ sudo ./haproxy -vv
> HA-Proxy version 1.7-dev3-4b788f-35 2016/06/08
> Copyright 2000-2016 Willy Tarreau 
> 
> Build options :
>   TARGET  = linux2628
>   CPU = generic
>   CC  = gcc
>   CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
>   OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1
> 
> Default settings :
>   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
> 
> Encrypted password support via crypt(3): yes
> Built with zlib version : 1.2.8
> Compression algorithms supported : identity("identity"), deflate("deflate"), 
> raw-deflate("deflate"), gzip("gzip")
> Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
> Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
> OpenSSL library supports TLS extensions : yes
> OpenSSL library supports SNI : yes
> OpenSSL library supports prefer-server-ciphers : yes
> Built with PCRE version : 8.31 2012-07-06
> PCRE library supports JIT : no (USE_PCRE_JIT not set)
> Built without Lua support
> Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT 
> IP_FREEBIND
> 
> Available polling systems :
>   epoll : pref=300,  test result OK
>poll : pref=200,  test result OK
>  select : pref=150,  test result OK
> Total: 3 (3 usable), will use epoll.
> 
> Available filters :
> [COMP] compression
> [TRACE] trace


Running this commandline:
> erlacher@vmrbg81:~/haproxy$ sudo ./haproxy -d -f /etc/haproxy/haproxy.cfg 
>   
>   
>  

Results in the ldapcheck output going out to just regular stdout. So I'll have 
to run haproxy daemonized to reproduce the problem...
> erlacher@vmrbg81:~/haproxy$ sudo ./haproxy -f /etc/haproxy/haproxy.cfg

This results in the expected erroneous output on the first client connection.

Now to apply the patch and rebuild...

Running haproxy with -d still gives the check output on haproxy stdout, which 
is ok I suppose.

Running haproxy daemonized... gives no check output. Good!

So from this small test I believe the patch works.

Best,
Luke


On 08.06.2016 03:17, Simon Horman wrote:
> On Tue, Jun 07, 2016 at 08:18:21PM +0200, Willy Tarreau wrote:
>> On Tue, Jun 07, 2016 at 12:01:31PM +0200, Benoit Garnier wrote:
>>> You can always open /dev/null before chrooting and dup() it into FD 0 and 1 
>>> after chroot() has been called.
>>
>> I'd be more tempted to simply close those FDs after the fork(). That
>> may improve the ability to detect faulty scripts which try to dump
>> GBs of data.
>>
>> A very long time ago I've seen a health check perform an LDAP search
>> retrieving all the hundreds of thousands of members of a group, and
>> the people in charge for the server were complaining that the health
>> checks were hurting the server... Better have the script fail with a
>> broken pipe in this case.
>>
>> Just a suggestion.
> 
> Thanks, I think that is reasonable. I particularly like its simplicity.
> 
> Lukas, could you try this?
> 
> diff --git a/src/checks.c b/src/checks.c
> index c4ac947b6051..e65d28f7c3c6 100644
> --- a/src/checks.c
> +++ b/src/checks.c
> @@ -1836,6 +1836,12 @@ static int connect_proc_chk(struct task *t)
>   if (pid == 0) {
>   /* Child */
>   extern char **environ;
> +
> + if ((global.mode & MODE_QUIET) && !(global.mode & 
> MODE_VERBOSE)) {
> + close(0);
> + close(1);
> + }
> +
>   environ = check->envp;
>   extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, 
> ultoa_r(s->cur_sess, buf, sizeof(buf)));
>   execvp(px->check_command, check->argv);
> 

-- 

Mit freundlichen Gruessen,
Lukas Erlacher

--
Rechnerbetriebsgruppe der Fakultäten Mathematik und Informatik
Raum 00.05.042
Tel. 089-289-18258
erlac...@in.tum.de
Technische Universität München - Boltzmannstr. 3 - 85748 Garching



smime.p7s
Description: S/MIME Cryptographic Signature


Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions

2016-06-07 Thread Simon Horman
On Tue, Jun 07, 2016 at 08:18:21PM +0200, Willy Tarreau wrote:
> On Tue, Jun 07, 2016 at 12:01:31PM +0200, Benoit Garnier wrote:
> > You can always open /dev/null before chrooting and dup() it into FD 0 and 1 
> > after chroot() has been called.
> 
> I'd be more tempted to simply close those FDs after the fork(). That
> may improve the ability to detect faulty scripts which try to dump
> GBs of data.
> 
> A very long time ago I've seen a health check perform an LDAP search
> retrieving all the hundreds of thousands of members of a group, and
> the people in charge for the server were complaining that the health
> checks were hurting the server... Better have the script fail with a
> broken pipe in this case.
> 
> Just a suggestion.

Thanks, I think that is reasonable. I particularly like its simplicity.

Lukas, could you try this?

diff --git a/src/checks.c b/src/checks.c
index c4ac947b6051..e65d28f7c3c6 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -1836,6 +1836,12 @@ static int connect_proc_chk(struct task *t)
if (pid == 0) {
/* Child */
extern char **environ;
+
+   if ((global.mode & MODE_QUIET) && !(global.mode & 
MODE_VERBOSE)) {
+   close(0);
+   close(1);
+   }
+
environ = check->envp;
extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, 
ultoa_r(s->cur_sess, buf, sizeof(buf)));
execvp(px->check_command, check->argv);



Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions

2016-06-07 Thread Willy Tarreau
On Tue, Jun 07, 2016 at 12:01:31PM +0200, Benoit Garnier wrote:
> You can always open /dev/null before chrooting and dup() it into FD 0 and 1 
> after chroot() has been called.

I'd be more tempted to simply close those FDs after the fork(). That
may improve the ability to detect faulty scripts which try to dump
GBs of data.

A very long time ago I've seen a health check perform an LDAP search
retrieving all the hundreds of thousands of members of a group, and
the people in charge for the server were complaining that the health
checks were hurting the server... Better have the script fail with a
broken pipe in this case.

Just a suggestion.

Willy



Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions

2016-06-07 Thread Benoit Garnier
You can always open /dev/null before chrooting and dup() it into FD 0 and 1 
after chroot() has been called.

Le 7 juin 2016 06:23:04 GMT+02:00, Simon Horman  a écrit :
>Hi Cyril, Hi Lukas,
>
>On Mon, Jun 06, 2016 at 08:21:46PM +0200, Cyril Bonté wrote:
>> Hi Lukas,
>> 
>> I add Malcolm and Simon to the thread.
>> 
>> Le 06/06/2016 à 08:36, Lukas Erlacher a écrit :
>> >Additional info: The output only ends up in the *first* client
>connection.
>> >
>> >Talking about this with some colleagues we're now theorizing that
>stdout goes to fd 1 and fd 1 is also the first client connection
>socket. Might be helpful for tracking this down.
>> 
>> After doing a quick test, I can confirm I can reproduce the issue
>(once
>> working in daemon mode).
>> 
>> Simon, do you have time to work on a fix ? or should someone else do
>? (I
>> think I'll be available to work on this, but only by the end of the
>week)
>> 
>> Also Malcolm, as I remember you are using this feature, be aware that
>you
>> may hit the issue too.
>
>looking over the code a little the theory regarding fd 1 seems entirely
>plausible as stdout, along with stdin and stdin, may be be closed 
>in haproxy.c:main()
>
>It seems to me that if the case where they were closed, and thus fd 1
>and 2
>may be used for other purposes, it would be prudent to redirect fd 1
>and 2
>to "/dev/null".
>
>One problem I see with this is that if haproxy is running in a chroot
>and /dev/null is not present then open() will fail.
>
>Lukas, would it be possible for you to test the following
>(I have only compile tested it) ?
>
>
>diff --git a/src/checks.c b/src/checks.c
>index c4ac947b6051..eca7df62522f 100644
>--- a/src/checks.c
>+++ b/src/checks.c
>@@ -1836,6 +1836,16 @@ static int connect_proc_chk(struct task *t)
>   if (pid == 0) {
>   /* Child */
>   extern char **environ;
>+
>+  if ((global.mode & MODE_QUIET) && !(global.mode & 
>MODE_VERBOSE)) {
>+  close(0);
>+  close(1);
>+
>+  if (open("/dev/null", 0, O_WRONLY) || open("/dev/null", 
>0,
>O_WRONLY)) {
>+  exit (-1);
>+  }
>+  }
>+
>   environ = check->envp;
>   extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN,
>ultoa_r(s->cur_sess, buf, sizeof(buf)));
>   execvp(px->check_command, check->argv);

-- 
Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté.

Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions

2016-06-06 Thread Simon Horman
Hi Cyril, Hi Lukas,

On Mon, Jun 06, 2016 at 08:21:46PM +0200, Cyril Bonté wrote:
> Hi Lukas,
> 
> I add Malcolm and Simon to the thread.
> 
> Le 06/06/2016 à 08:36, Lukas Erlacher a écrit :
> >Additional info: The output only ends up in the *first* client connection.
> >
> >Talking about this with some colleagues we're now theorizing that stdout 
> >goes to fd 1 and fd 1 is also the first client connection socket. Might be 
> >helpful for tracking this down.
> 
> After doing a quick test, I can confirm I can reproduce the issue (once
> working in daemon mode).
> 
> Simon, do you have time to work on a fix ? or should someone else do ? (I
> think I'll be available to work on this, but only by the end of the week)
> 
> Also Malcolm, as I remember you are using this feature, be aware that you
> may hit the issue too.

looking over the code a little the theory regarding fd 1 seems entirely
plausible as stdout, along with stdin and stdin, may be be closed 
in haproxy.c:main()

It seems to me that if the case where they were closed, and thus fd 1 and 2
may be used for other purposes, it would be prudent to redirect fd 1 and 2
to "/dev/null".

One problem I see with this is that if haproxy is running in a chroot
and /dev/null is not present then open() will fail.

Lukas, would it be possible for you to test the following
(I have only compile tested it) ?


diff --git a/src/checks.c b/src/checks.c
index c4ac947b6051..eca7df62522f 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -1836,6 +1836,16 @@ static int connect_proc_chk(struct task *t)
if (pid == 0) {
/* Child */
extern char **environ;
+
+   if ((global.mode & MODE_QUIET) && !(global.mode & 
MODE_VERBOSE)) {
+   close(0);
+   close(1);
+
+   if (open("/dev/null", 0, O_WRONLY) || open("/dev/null", 
0, O_WRONLY)) {
+   exit (-1);
+   }
+   }
+
environ = check->envp;
extchk_setenv(check, EXTCHK_HAPROXY_SERVER_CURCONN, 
ultoa_r(s->cur_sess, buf, sizeof(buf)));
execvp(px->check_command, check->argv);




Re: external-check stdout ends up in load-balanced traffic, destroying tcp sessions

2016-06-06 Thread Cyril Bonté

Hi Lukas,

I add Malcolm and Simon to the thread.

Le 06/06/2016 à 08:36, Lukas Erlacher a écrit :

Additional info: The output only ends up in the *first* client connection.

Talking about this with some colleagues we're now theorizing that stdout goes 
to fd 1 and fd 1 is also the first client connection socket. Might be helpful 
for tracking this down.


After doing a quick test, I can confirm I can reproduce the issue (once 
working in daemon mode).


Simon, do you have time to work on a fix ? or should someone else do ? 
(I think I'll be available to work on this, but only by the end of the week)


Also Malcolm, as I remember you are using this feature, be aware that 
you may hit the issue too.



--
Cyril Bonté