Re: haproxy-1.8.0, sending a email-alert causes 100% cpu usage, FreeBSD 11.1

2017-11-28 Thread PiBa-NL

Hi Christopher / Willy,

On Tue, Nov 28, 2017 at 10:28:20AM +0100, Christopher Faulet wrote:


Here is a patch that should fix the deadlock. Could you confirm it fixes
your bug ?

Fix confirmed.

Thanks,
PiBa-NL / Pieter



Re: haproxy-1.8.0, sending a email-alert causes 100% cpu usage, FreeBSD 11.1

2017-11-28 Thread Willy Tarreau
Hi Christopher,

On Tue, Nov 28, 2017 at 10:28:20AM +0100, Christopher Faulet wrote:
> Here is a patch that should fix the deadlock. Could you confirm it fixes
> your bug ?

At least it works for me, previously a config with a mailer would
immediately go to 100% CPU once reporting a down backend, now it's OK.

> Emeric, this patch should be good, but take a look on it, just to be sure.

>From what I've seen, the tcpcheck_main() function isn't called from
anywhere else, so removing the lock was indeed the best thing to do.

I'm a bit sad that we're still missing the conditions to call many
functions in the function's documentation. That used to be a problem
in the past, but it becomes critical with locking as we must always
know whether they must not/may/must be called with a lock held or not.
This will definitely need to be documented, or such bugs will happen
routinely.

I've applied your fix, thanks!

Willy



Re: haproxy-1.8.0, sending a email-alert causes 100% cpu usage, FreeBSD 11.1

2017-11-28 Thread Christopher Faulet

Le 28/11/2017 à 07:25, Willy Tarreau a écrit :

Hi Pieter,

On Mon, Nov 27, 2017 at 09:43:52PM +0100, PiBa-NL wrote:

Hi List,

I thought i 'reasonably' tested some of 1.8.0's options.
Today i put it into 'production' on my secondary cluster node and notice it
takes 100% cpu...


G.. bad. This sounds like another case of recursive locking.


I guess i should have tried such a thing last week.


Don't worry, whatever the amount of tests you run, some bugs will always
slip through.


Anyhow below some gdb and console output.


Very useful, I found it :

process_chk_conn() takes the lock then calls connect_conn_chk() :
   2114  HA_SPIN_LOCK(SERVER_LOCK, >server->lock);
   2137  ret = connect_conn_chk(t);

connect_conn_chk() then calls tcpcheck_main() :
   1548  tcpcheck_main(check);

And this one takes the lock again :
   2598  HA_SPIN_LOCK(SERVER_LOCK, >server->lock);

CCing Emeric as he's the one who covered the checks so he will know best
how to fix it.

In the mean time, if you don't need threads you can rebuild with "USE_THREAD="
to disable them, but I'd rather wait for a fix. Sorry about that, and thaks
for the report.



Hi Pieter,

Here is a patch that should fix the deadlock. Could you confirm it fixes 
your bug ?


Emeric, this patch should be good, but take a look on it, just to be sure.

Thanks
--
--
Christopher Faulet
>From 5fd4083becd141080ec8cf0923b222e0ae6119af Mon Sep 17 00:00:00 2001
From: Christopher Faulet 
Date: Tue, 28 Nov 2017 10:06:29 +0100
Subject: [PATCH] BUG/MEDIUM: tcp-check: Don't lock the server in tcpcheck_main

There was a deadlock in tcpcheck_main function. The server's lock was already
acquired by the caller (process_chk_conn or wake_srv_chk).

This patch must be backported in 1.8.
---
 src/checks.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/src/checks.c b/src/checks.c
index ad25ac094..63747201e 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -2595,8 +2595,6 @@ static int tcpcheck_main(struct check *check)
 	struct list *head = check->tcpcheck_rules;
 	int retcode = 0;
 
-	HA_SPIN_LOCK(SERVER_LOCK, >server->lock);
-
 	/* here, we know that the check is complete or that it failed */
 	if (check->result != CHK_RES_UNKNOWN)
 		goto out_end_tcpcheck;
@@ -2637,7 +2635,7 @@ static int tcpcheck_main(struct check *check)
 			if (s->proxy->timeout.check)
 t->expire = tick_first(t->expire, t_con);
 		}
-		goto out_unlock;
+		goto out;
 	}
 
 	/* special case: option tcp-check with no rule, a connect is enough */
@@ -2732,7 +2730,7 @@ static int tcpcheck_main(struct check *check)
 	chunk_appendf(, " comment: '%s'", comment);
 set_server_check_status(check, HCHK_STATUS_SOCKERR, trash.str);
 check->current_step = NULL;
-goto out_unlock;
+goto out;
 			}
 
 			if (check->cs)
@@ -2854,7 +2852,7 @@ static int tcpcheck_main(struct check *check)
 	if (s->proxy->timeout.check)
 		t->expire = tick_first(t->expire, t_con);
 }
-goto out_unlock;
+goto out;
 			}
 
 		} /* end 'connect' */
@@ -3059,7 +3057,7 @@ static int tcpcheck_main(struct check *check)
 	if (>current_step->list != head &&
 	check->current_step->action == TCPCHK_ACT_EXPECT)
 		__cs_want_recv(cs);
-	goto out_unlock;
+	goto out;
 
  out_end_tcpcheck:
 	/* collect possible new errors */
@@ -3074,8 +3072,7 @@ static int tcpcheck_main(struct check *check)
 
 	__cs_stop_both(cs);
 
- out_unlock:
-	HA_SPIN_UNLOCK(SERVER_LOCK, >server->lock);
+ out:
 	return retcode;
 }
 
-- 
2.13.6



Re: haproxy-1.8.0, sending a email-alert causes 100% cpu usage, FreeBSD 11.1

2017-11-27 Thread Willy Tarreau
Hi Pieter,

On Mon, Nov 27, 2017 at 09:43:52PM +0100, PiBa-NL wrote:
> Hi List,
> 
> I thought i 'reasonably' tested some of 1.8.0's options.
> Today i put it into 'production' on my secondary cluster node and notice it
> takes 100% cpu...

G.. bad. This sounds like another case of recursive locking.

> I guess i should have tried such a thing last week.

Don't worry, whatever the amount of tests you run, some bugs will always
slip through.

> Anyhow below some gdb and console output.

Very useful, I found it :

process_chk_conn() takes the lock then calls connect_conn_chk() :
  2114  HA_SPIN_LOCK(SERVER_LOCK, >server->lock);
  2137  ret = connect_conn_chk(t);

connect_conn_chk() then calls tcpcheck_main() :
  1548  tcpcheck_main(check);

And this one takes the lock again :
  2598  HA_SPIN_LOCK(SERVER_LOCK, >server->lock);

CCing Emeric as he's the one who covered the checks so he will know best
how to fix it.

In the mean time, if you don't need threads you can rebuild with "USE_THREAD="
to disable them, but I'd rather wait for a fix. Sorry about that, and thaks
for the report.

Willy



haproxy-1.8.0, sending a email-alert causes 100% cpu usage, FreeBSD 11.1

2017-11-27 Thread PiBa-NL

Hi List,

I thought i 'reasonably' tested some of 1.8.0's options.
Today i put it into 'production' on my secondary cluster node and notice 
it takes 100% cpu... I guess i should have tried such a thing last week.
My regular config with 10 frontends and total 13 servers seems to 
startup fine when 'email-alert level' is set to 'emerg' , doesnt need to 
send a mail then..


Anyhow below some gdb and console output.
Config that reproduces it is pretty simple no new features used or anything.
Though the server is 'down' so it is trying to send a mail for that.. 
that never seems to happen though.. no mail is received.


I tried using nokqueu and nopoll, but that did not result in any 
improvement..


Anything else i can provide?

Regards,
PiBa-NL / Pieter

haproxy -f /root/hap.conf -V
[WARNING] 330/204605 (14771) : config : missing timeouts for frontend 
'TestMailFront'.
   | While not properly invalid, you will certainly encounter various 
problems
   | with such a configuration. To fix this, please ensure that all 
following

   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[WARNING] 330/204605 (14771) : config : missing timeouts for backend 
'TestMailBack'.
   | While not properly invalid, you will certainly encounter various 
problems
   | with such a configuration. To fix this, please ensure that all 
following

   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Note: setting global.maxconn to 2000.
Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result FAILED
Total: 3 (2 usable), will use kqueue.

Available filters :
    [TRACE] trace
    [COMP] compression
    [SPOE] spoe
Using kqueue() as the polling mechanism.
[WARNING] 330/204608 (14771) : Server TestMailBack/TestServer is DOWN, 
reason: Layer4 timeout, check duration: 2009ms. 0 active and 0 backup 
servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

[ALERT] 330/204608 (14771) : backend 'TestMailBack' has no server available!

Complete configuration that reproduces the issue:

mailers globalmailers
    mailer ex01 192.168.0.40:25
frontend TestMailFront
    bind :88
    default_backend  TestMailBack
backend TestMailBack
    server TestServer 192.168.0.250:80 check
    email-alert mailers            globalmailers
    email-alert level            info
    email-alert from            haproxy@me.local
    email-alert to            m...@me.tld
    email-alert myhostname        pfs


root@:~ # haproxy -vv
HA-Proxy version 1.8.0 2017/11/26
Copyright 2000-2017 Willy Tarreau 

Build options :
  TARGET  = freebsd
  CPU = generic
  CC  = cc
  CFLAGS  = -pipe -g -fstack-protector -fno-strict-aliasing 
-fno-strict-aliasing -Wdeclaration-after-statement -fwrapv 
-Wno-address-of-packed-member -Wno-null-dereference -Wno-unused-label 
-DFREEBSD_PORTS
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 
USE_ACCEPT4=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 
USE_PCRE_JIT=1


Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Built with multi-threading support.
Encrypted password support via crypt(3): yes
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with Lua version : Lua 5.3.4
Built with OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available filters :
    [TRACE] trace
    [COMP] compression
    [SPOE] spoe

root@:~ #

root@:~ # /usr/local/bin/gdb --pid 14771
GNU gdb (GDB) 8.0.1 [GDB v8.0.1 for FreeBSD]
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 


This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd11.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word"