Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-10-12 Thread Willy Tarreau
On Fri, Oct 12, 2018 at 02:29:51PM +0200, PiBa-NL wrote:
> Op 12-10-2018 om 10:53 schreef William Lallemand:
> > The attached patch should fix the issue.
> 
> The patch works for me, thanks.

Great, patch now merged.

Thanks!
Willy



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-10-12 Thread PiBa-NL

Hi William,

Op 12-10-2018 om 10:53 schreef William Lallemand:

The attached patch should fix the issue.


The patch works for me, thanks.

Regards,

PiBa-NL (Pieter)




Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-10-12 Thread William Lallemand
On Thu, Oct 11, 2018 at 11:06:38PM +0200, William Lallemand wrote:
> On Thu, Oct 11, 2018 at 09:31:48PM +0200, PiBa-NL wrote:
> > Hi Willy, William,
> > 
> 
> Hi Peter,
> 
> Regarding this part:
>  
> > -Connection and request counters to low when ran as regtest from 
> > varnishtest (bug?)
> > It turns out that starting haproxy from varnishtest, and using -W 
> > master-worker mode, actually creates 2 processes that are handling 
> > traffic. That explains that a large part of connections isn't seen by 
> > the other haproxy instance and stats showing to low amounts of 
> > connections. Bisecting it seems to fail on this commit: b3f2be3 , 
> > perhaps @William can you take a look at it? Not really sure when this 
> > occurs in a 'real' environment, it doesn't seem to happen when manually 
> > running haproxy -W, but still its strange that when varnisttest is 
> > calling haproxy this occurs.
> > 
> 
> There was an exception in the master's code regarding the inherited FDs (fd@),
> which is the case with varnishtest. There is a flag which was forbidding the
> master to close the FD in 1.8.
> 
> Now there is a polling loop in the master and I think there is a side effect
> where the FD is registered in the master polling loop.
> 
> It's just a guess, I'll look at it tomorrow.
> 
> Cheers,
> 

The attached patch should fix the issue.

-- 
William Lallemand
>From 3f2c30a0f15e30b7941245d3c5b20cbd5561489f Mon Sep 17 00:00:00 2001
From: William Lallemand 
Date: Fri, 12 Oct 2018 10:39:54 +0200
Subject: [PATCH] BUG/MEDIUM: mworker: don't poll on LI_O_INHERITED listeners

The listeners with the LI_O_INHERITED flag were deleted but not unbound
which is a problem since we have a polling in the master.

This patch unbind every listeners which are not require for the master,
but does not close the FD of those that have a LI_O_INHERITED flag.
---
 src/haproxy.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/haproxy.c b/src/haproxy.c
index a7b07a267..82da86222 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -615,13 +615,17 @@ static void mworker_cleanlisteners()
 
 	for (curproxy = proxies_list; curproxy; curproxy = curproxy->next) {
 		list_for_each_entry_safe(l, l_next, >conf.listeners, by_fe) {
-			/* does not close if the FD is inherited with fd@
-			 * from the parent process */
-			if (!(l->options & (LI_O_INHERITED|LI_O_MWORKER)))
-unbind_listener(l);
 			/* remove the listener, but not those we need in the master... */
-			if (!(l->options & LI_O_MWORKER))
+			if (!(l->options & LI_O_MWORKER)) {
+/* unbind the listener but does not close if
+   the FD is inherited with fd@ from the parent
+   process */
+if (l->options & LI_O_INHERITED)
+	unbind_listener_no_close(l);
+else
+	unbind_listener(l);
 delete_listener(l);
+			}
 		}
 	}
 }
-- 
2.16.4



Re: High CPU Usage followed by segfault error

2018-10-12 Thread Soji Antony
Hi Oliver,

Thanks for the suggestion. We have upgraded haproxy to 1.8.14 but seeing
the same CPU issue again.
I have found that the segmentation fault which we were seeing earlier is
not related to the CPU spike as it is happening at different time. Recently
we had the same issue with one of our haproxy servers and found the
following in strace o/p:

# haproxy -vv

HA-Proxy version 1.8.14-1ppa1~trusty 2018/09/23
Copyright 2000-2018 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4
-Wformat -Werror=format-security -D_FORTIFY_SOURCE=2
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1
USE_LUA=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_NS=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
Built with Lua version : Lua 5.3.1
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.31 2012-07-06
Running on PCRE version : 8.31 2012-07-06
PCRE library supports JIT : no (libpcre build without JIT?)
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace

Strace O/P:

[pid 114266] <... sched_yield resumed> ) = 0
[pid 114265] sched_yield( 
[pid 114267] <... sched_yield resumed> ) = 0
[pid 114266] sched_yield( 
[pid 114265] <... sched_yield resumed> ) = 0
[pid 114267] sched_yield( 
[pid 114266] <... sched_yield resumed> ) = 0
[pid 114265] sched_yield( 
[pid 114267] <... sched_yield resumed> ) = 0
[pid 114266] sched_yield( 
[pid 114265] <... sched_yield resumed> ) = 0
[pid 114267] sched_yield( 
[pid 114266] <... sched_yield resumed> ) = 0
[pid 114265] sched_yield( 
[pid 114267] <... sched_yield resumed> ) = 0
[pid 114266] sched_yield( 
[pid 114265] <... sched_yield resumed> ) = 0
[pid 114267] sched_yield( 
[pid 114266] <... sched_yield resumed> ) = 0
[pid 114265] sched_yield( 
[pid 114267] <... sched_yield resumed> ) = 0
[pid 114266] sched_yield( 
[pid 114265] <... sched_yield resumed> ) = 0
[pid 114267] sched_yield( 
[pid 114266] <... sched_yield resumed> ) = 0
[pid 114265] sched_yield( 
[pid 114267] <... sched_yield resumed> ) = 0
[pid 114266] sched_yield( 
[pid 114265] <... sched_yield resumed> ) = 0
[pid 114267] sched_yield( 
[pid 114266] <... sched_yield resumed> ) = 0
[pid 114265] sched_yield( 
[pid 114267] <... sched_yield resumed> ) = 0
[pid 114266] sched_yield( 
[pid 114265] <... sched_yield resumed> ) = 0
[pid 114267] sched_yield( 
[pid 114266] <... sched_yield resumed> ) = 0
[pid 114267] <... sched_yield resumed> ) = 0
[pid 114266] sched_yield( 
[pid 114265] sched_yield( 
[pid 114267] sched_yield( 
[pid 114266] <... sched_yield resumed> ) = 0
[pid 114265] <... sched_yield resumed> ) = 0
[pid 114267] <... sched_yield resumed> ) = 0
[pid 114266] sched_yield( 
[pid 114265] sched_yield( 
[pid 114267] sched_yield( 
[pid 114266] <... sched_yield resumed> ) = 0
[pid 114265] <... sched_yield resumed> ) = 0
[pid 114267] <... sched_yield resumed> ) = 0
[pid 114266] sched_yield( 
[pid 114265] sched_yield( 

kernel.log

Oct 10 19:13:04 int16 kernel: [192997.62] sched: RT throttling activated
Oct 10 19:16:28 int16 kernel: [193201.140115] INFO: task :1213
blocked for more than 120 seconds.
Oct 10 19:16:28 int16 kernel: [193201.144250]   Tainted: G   OE
  
Oct 10 19:16:28 int16 kernel: [193201.147927] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 10 19:16:28 int16 kernel: [193201.152389]  880768c878a8
880768c87968 880766ae3700 880768c88000
Oct 10 19:16:28 int16 kernel: [193201.152392]  
7fff 88078ffd50f0 817f1700
Oct 10 19:16:28 int16 kernel: [193201.152394]  880768c878c0
817f0fb5 88076f917200 880768c87970
Oct 10 19:16:28 int16 kernel: [193201.152396] Call Trace:
Oct 10 19:16:28 int16 kernel: [193201.152402]  [] ?
bit_wait+0x50/0x50
Oct 10 19:16:28 int16 kernel: [193201.152404]  []
schedule+0x35/0x80
Oct 10 19:16:28 int16 kernel: [193201.152418]  []
schedule_timeout+0x23b/0x2d0
Oct 10 19:16:28 int16 kernel: [193201.152430]  [] ?
xen_clocksource_read+0x15/0x20
Oct 10 19:16:28 int16 kernel: [193201.152438]  [] ?
sched_clock+0x9/0x10
Oct 10 19:16:28 int16 kernel: [193201.152441]  [] ?