Re: [PATCH] REGTEST: stick-tables: Test expiration when used with table_*

2018-06-20 Thread Willy Tarreau
Hi Daniel,

On Wed, Jun 20, 2018 at 10:28:43AM -0400, Daniel Corbett wrote:
> +shell -expect "used:0" {
> +echo "show table http1" |socat ${tmpdir}/h1/stats.sock -
^

This is the point where it will start to require that we organize the
reg tests better. First, socat is rarely installed by default so we'll
have to mention that it's required. Second, socat introduces half a
second delay before quitting, making it impractical for the quick
automated tests that we expect developers to run frequently.

The dependency on socat makes me think we could probably put all of such
tests in a specific sub-directory. However, I predict that we will also
create a number of other ones which will be slower than average and which
will be unrelated to the CLI.

Maybe we could simply introduce levels :
  - level 1 (the default) would contain only the immediate tests that cover
the internal state machine and HTTP compliance (the things we break the
most often by side effets when fixing a bug in the same area). Basically
we should expect to be able to run 100 tests in a second there and there
should be zero excuse for not running them before committing a patch
affecting a sensitive area.

  - level 2 would cover some extra parts requiring a bit more time (eg:
CLI commands, horrible stuff involving tcploop) and would probably
be needed only when trying to ensure that a fix doesn't break
something unexpected.

  - level 3 would be the painful one that we already know nobody will dare
to run. They would cover timeouts, health checks, etc. All the stuff
that takes multiple seconds per test would be there. They may occasionally
be run by a dev during lunch time, or at night by automated bots.

Then we could issue "make reg-tests" to run level 1 by default or
"make reg-tests LEVEL=" for the other ones. The idea is that I would
*really* like to encourage developers to run some basic tests before sending
patches, and we all know that none of us will accept to run them if they take
more time than what is needed to divert us (ie if you have time to switch to
reading your mails while the test runs, we won't run them because this will
create distraction).

Also an important point since I'm seeing that the first tests focus on
known bugs (and I know it's often the point of regression testing) :
we've never reintroduced a bug that was already fixed, and it's likely
the same in most other projects. The reason is human nature consisting
in learning from our mistakes. So in practice, focusing on testing that
a very specific bug doesn't exist anymore is pointless because there is
zero chance that the exact same bug will re-appear, thus too precise a
test will just waste development time without bringing value.

HOWEVER, each such bugs belong to a *class* of bugs which are very
likely to be hit many times in various ways. This means two things :
  1) each test must be written in a way that tries to cover the widest
 spectrum of causes for a single class ;

  2) when writing such a test, if other very closely related variants
 are imagined, it's worth adding all these variants at once even
 if they cover bugs that have not been met yet.

In your case, the test covers large parts (ie case 1 above), but only
focuses on "used:0". I think it's worth checking what else in the output
could possibly break. Given the number of entries tracked, there are very
likely a number of values set at once which deserve being matched because
we know what to expect there. By making the check cover them as well, we
increase the likeliness that this test will detect a regression affecting
stick tables, thus the usefulness of the test, hence the likeliness that
a developer will want to run it.

Then maybe you'll find that the request made doesn't trigger certain fields
and that we should slightly improve it to increase the coverage. This will
result in a test that still detects the bug you wanted to cover, but as part
of a class of bugs and not just this specific one. I suspect that for this
it will make sense to implement the server part to provide a valid response
in order to get more fields filled.

Just my two cents,
Willy



tcp-check expect with exclamation mark

2018-06-20 Thread Dmitriy Kuzmin
Greetings

I’m using haproxy to load balance readonly queries between redis slaves.
I want to use health check system to exclude slaves from load balancing,
that are in a process of sync with master.
The idea is to look for a string “master_sync_in_progress:1” in response to
“info replication”.
If this string is found then backend should be marked as down.

I’m trying to use “tcp-check expect ! string” (with exclamation mark [!])
to get this working, but backends are permanently down regardless of sync
status.
During sync (slave’s response contains “master_sync_in_progress:1”) health
check status is “L7RSP,TCPCHK matched unwanted content
'master_sync_in_progress:1' at step 5”.
However, when slave’s response contains “master_sync_in_progress:0” (sync
finished) health check status is “L7TOUT, at step 5 of tcp-check (expect
string 'master_sync_in_progress:1’)”.

Does negation with exclamation mark (!) work with tcp-check at all?

I’ve observed this behaviour in haproxy 1.5.18, 1.7.11 and 1.8.9


Example configuration:

global
log   127.0.0.1 local2
chroot/var/lib/haproxy
pidfile   /var/run/haproxy.pid
maxconn   4000
user  haproxy
group haproxy
daemon

defaults
modetcp
log global
option  tcplog
option  dontlognull
option  redispatch
retries 3
timeout http-request10s
timeout queue   1m
timeout connect 10s
timeout client  1m
timeout server  1m
timeout http-keep-alive 10s
timeout check   10s
maxconn 3000

frontend redis_reads_ft
bind/var/run/haproxy/redis_reads
use_backend redis_reads_bk

backend redis_reads_bk
option log-health-checks
balance   roundrobin

optiontcp-check
tcp-check connect
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect ! string master_sync_in_progress:1
tcp-check send QUIT\r\n
tcp-check expect string +OK

server sc-redis1_63811 10.10.68.61:63811 check
server sc-redis1_63812 10.10.68.61:63812 check
server sc-redis1_63813 10.10.68.61:63813 check


Best regards,
Dmitriy Kuzmin


Re: how to run vtc files?

2018-06-20 Thread Willy Tarreau
On Wed, Jun 20, 2018 at 04:39:36PM -0400, Daniel Corbett wrote:
> Hello,
> 
> 
> On 06/20/2018 03:34 PM,  ??? wrote:
> > hi
> > 
> > 
> > [ilia@localhost haproxy]$ HAPROXY_PROGRAM=./haproxy varnishtest
> > reg-tests/ssl/h0.vtc
> >  top   0.0 extmacro def pwd=/home/ilia/xxx/haproxy
> >  top   0.0 extmacro def localhost=127.0.0.1
> >  top   0.0 extmacro def bad_backend=127.0.0.1 36769
> >  top   0.0 extmacro def bad_ip=192.0.2.255
> >  top   0.0 macro def tmpdir=/tmp/vtc.31222.33cfe809
> > *    top   0.0 TEST reg-tests/ssl/h0.vtc starting
> > **   top   0.0 === varnishtest "OpenSSL bug: Random crashes"
> > *    top   0.0 TEST OpenSSL bug: Random crashes
> > **   top   0.0 === feature ignore_unknown_macro
> > **   top   0.0 === haproxy h1 -conf {
> >  top   0.0 Unknown command: "haproxy"
> > *    top   0.0 RESETTING after reg-tests/ssl/h0.vtc
> > *    top   0.0 TEST reg-tests/ssl/h0.vtc FAILED
> > #    top  TEST reg-tests/ssl/h0.vtc FAILED (0.001) exit=2
> > [ilia@localhost haproxy]$
> 
> Something like this should work:
> 
> $ HAPROXY_PROGRAM=$PWD/haproxy varnishtest reg-tests/ssl/h0.vtc
> #    top  TEST reg-tests/ssl/h0.vtc passed (0.147)

This just makes me think that we should probably also set HAPROXY_PROGRAM
in the "reg-tests" target in the makefile to point to the current haproxy
executable, about like the below. Fred, what do you think ?

 reg-tests:
...
+   @export HAPROXY_PROGRAM=$${HAPROXY_PROGRAM:-$$PWD/haproxy}; \
@find reg-tests ...

Willy



Re: Issue with parsing DNS from AWS

2018-06-20 Thread Jim Deville
Attaching an anonymized PCAP from yesterday. The first two packets are the 
request and response for 4 servers, the second pair is the request and response 
for 3. The 3-server response parses successfully, and Jonathan was able to find 
that the 4-server response ends up hitting here 
https://github.com/haproxy/haproxy/blob/master/src/dns.c#L425.


I'd be happy for any workaround or explanation of what we could do differently, 
and happy to help get more info, or to try out a patch in our environment to 
confirm a fix if this is a bug as it seems.


Jim


From: Jim Deville
Sent: Tuesday, June 19, 2018 6:00:07 PM
To: haproxy@formilux.org
Cc: Jonathan Works
Subject: Issue with parsing DNS from AWS


We have a setup with ECS and AWS's Service Discovery being load balanced by 
HAProxy in order to support sticky sessions for WebSocket handshakes, and we're 
working on making it more efficient by upgrading to 1.8.9 and taking advantage 
of seamless reloads and DNS service discovery. We have a solution almost 
working, however, we're seeing an issue during scaling when the DNS response 
crosses a certain size.


We're using the following config (anonymized): 
https://gist.github.com/jredville/523de951d5ab6b60a0d345516bcf46d4

What we're seeing is:
  * if we bring up 3 target servers, they come up as healthy, and traffic is 
routed appropriately. If we restart haproxy, it comes up healthy
  * if we then scale to 4 or more servers, the 4th and additional are never 
recognized, however, the first 3 stay healthy
  * if we restart haproxy with 4 or more servers, no servers come up healthy

We've attempted to modify the init-addr setting, accepted_payload_size, check 
options, and we've tried with and without a server-template and this is the 
behavior we consistently see. If we run strace over haproxy, we see it making 
the DNS requests but never updating the state of the servers. At this point 
we're not sure if we have something wrong in config or if there is a bug in how 
haproxy parses responses from AWS. Johnathan (cc'd) has pcap's if that would be 
helpful as well.

Thanks,
Jim


haproxy-dns-srv.pcap
Description: haproxy-dns-srv.pcap


Re: how to run vtc files?

2018-06-20 Thread Daniel Corbett

Hello,


On 06/20/2018 03:34 PM, Илья Шипицин wrote:

hi


[ilia@localhost haproxy]$ HAPROXY_PROGRAM=./haproxy varnishtest 
reg-tests/ssl/h0.vtc

 top   0.0 extmacro def pwd=/home/ilia/xxx/haproxy
 top   0.0 extmacro def localhost=127.0.0.1
 top   0.0 extmacro def bad_backend=127.0.0.1 36769
 top   0.0 extmacro def bad_ip=192.0.2.255
 top   0.0 macro def tmpdir=/tmp/vtc.31222.33cfe809
*    top   0.0 TEST reg-tests/ssl/h0.vtc starting
**   top   0.0 === varnishtest "OpenSSL bug: Random crashes"
*    top   0.0 TEST OpenSSL bug: Random crashes
**   top   0.0 === feature ignore_unknown_macro
**   top   0.0 === haproxy h1 -conf {
 top   0.0 Unknown command: "haproxy"
*    top   0.0 RESETTING after reg-tests/ssl/h0.vtc
*    top   0.0 TEST reg-tests/ssl/h0.vtc FAILED
#    top  TEST reg-tests/ssl/h0.vtc FAILED (0.001) exit=2
[ilia@localhost haproxy]$


Something like this should work:

$ HAPROXY_PROGRAM=$PWD/haproxy varnishtest reg-tests/ssl/h0.vtc
#    top  TEST reg-tests/ssl/h0.vtc passed (0.147)


Thanks,
-- Daniel




how to run vtc files?

2018-06-20 Thread Илья Шипицин
hi


[ilia@localhost haproxy]$ HAPROXY_PROGRAM=./haproxy varnishtest
reg-tests/ssl/h0.vtc
 top   0.0 extmacro def pwd=/home/ilia/xxx/haproxy
 top   0.0 extmacro def localhost=127.0.0.1
 top   0.0 extmacro def bad_backend=127.0.0.1 36769
 top   0.0 extmacro def bad_ip=192.0.2.255
 top   0.0 macro def tmpdir=/tmp/vtc.31222.33cfe809
*top   0.0 TEST reg-tests/ssl/h0.vtc starting
**   top   0.0 === varnishtest "OpenSSL bug: Random crashes"
*top   0.0 TEST OpenSSL bug: Random crashes
**   top   0.0 === feature ignore_unknown_macro
**   top   0.0 === haproxy h1 -conf {
 top   0.0 Unknown command: "haproxy"
*top   0.0 RESETTING after reg-tests/ssl/h0.vtc
*top   0.0 TEST reg-tests/ssl/h0.vtc FAILED
#top  TEST reg-tests/ssl/h0.vtc FAILED (0.001) exit=2
[ilia@localhost haproxy]$


Re: remaining process after (seamless) reload

2018-06-20 Thread Willy Tarreau
On Wed, Jun 20, 2018 at 04:42:58PM +0200, Christopher Faulet wrote:
> When HAProxy is shutting down, it exits the polling loop when there is no jobs
> anymore (jobs == 0). When there is no thread, it works pretty well, but when
> HAProxy is started with several threads, a thread can decide to exit because
> jobs variable reached 0 while another one is processing a task (e.g. a
> health-check). At this stage, the running thread could decide to request a
> synchronization. But because at least one of them has already gone, the others
> will wait infinitly in the sync point and the process will never die.

Just a comment on this last sentence, I think this is the root cause of the
problem : a thread quits and its thread_mask bit doesn't disappear. In my
opinion if we're looping, it's precisely because there's no way by looking
at the all_threads_mask if some threads are missing. Thus I think that a more
reliable long term solution would require that each exiting thread does at
least "all_threads_mask &= ~tid_bit".

Now I have no idea whether or not the current sync point code is compatible
with this nor if this will be sufficient, but I'm pretty sure that over time
we'll have to go this way to fix this inconsistency.

Willy



Re: remaining process after (seamless) reload

2018-06-20 Thread Christopher Faulet

Le 20/06/2018 à 15:11, William Dauchy a écrit :

Hello Christopher,

Thank you for the quick answer and the patch.

On Wed, Jun 20, 2018 at 11:32 AM Christopher Faulet  wrote:

Here is a patch to avoid a thread to exit its polling loop while others
are waiting in the sync point. It is a theoretical patch because I was
not able to reproduce the bug.
Could you check if it fixes it please ?


Unfortunately, it does not fix the issue; I am getting the same backtrace:
#0  0x55f30960a10b in thread_sync_barrier (barrier=0x55f309875528
) at src/hathreads.c:114
#1  thread_enter_sync () at src/hathreads.c:127
#2  0x55f3095b27a2 in sync_poll_loop () at src/haproxy.c:2376
#3  run_poll_loop () at src/haproxy.c:2433
#4  run_thread_poll_loop (data=data@entry=0x55f31c4b0c50) at src/haproxy.c:2463
#5  0x55f30952e856 in main (argc=, argv=) at src/haproxy.c:3065

however the backtrace is different on some machines:

#0  0x7f31fd89ff57 in pthread_join () from /lib64/libpthread.so.0
#1  0x55e184a5486a in main (argc=, argv=) at src/haproxy.c:3069



Hum, ok, forget the previous patch. Here is a second try. It solves the 
same bug using another way. In this patch, all threads must enter in the 
sync point to exit. I hope it will do the trick.


--
Christopher Faulet
>From c01f5636a0cbe2be18573e455370c4a47f84d59e Mon Sep 17 00:00:00 2001
From: Christopher Faulet 
Date: Wed, 20 Jun 2018 16:22:03 +0200
Subject: [PATCH] BUG/MEDIUM: threads: Use the sync point to check active jobs
 and exit

When HAProxy is shutting down, it exits the polling loop when there is no jobs
anymore (jobs == 0). When there is no thread, it works pretty well, but when
HAProxy is started with several threads, a thread can decide to exit because
jobs variable reached 0 while another one is processing a task (e.g. a
health-check). At this stage, the running thread could decide to request a
synchronization. But because at least one of them has already gone, the others
will wait infinitly in the sync point and the process will never die.

To fix the bug, when the first thread (and only this one) detects there is no
active jobs anymore, it requests a synchronization. And in the sync point, all
threads will check if jobs variable reached 0 to exit the polling loop.

This patch must be backported in 1.8.
---
 src/haproxy.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/haproxy.c b/src/haproxy.c
index 1c0455c56..5906f7989 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -2372,10 +2372,12 @@ void mworker_pipe_register()
 	fd_want_recv(mworker_pipe[0]);
 }
 
-static void sync_poll_loop()
+static int sync_poll_loop()
 {
+	int stop = 0;
+
 	if (THREAD_NO_SYNC())
-		return;
+		return stop;
 
 	THREAD_ENTER_SYNC();
 
@@ -2389,7 +2391,9 @@ static void sync_poll_loop()
 
 	/* *** } */
   exit:
+	stop = (jobs == 0); /* stop when there's nothing left to do */
 	THREAD_EXIT_SYNC();
+	return stop;
 }
 
 /* Runs the polling loop */
@@ -2410,9 +2414,10 @@ static void run_poll_loop()
 		/* Check if we can expire some tasks */
 		next = wake_expired_tasks();
 
-		/* stop when there's nothing left to do */
-		if (jobs == 0)
-			break;
+		/* the first thread requests a synchronization to exit when
+		 * there is no active jobs anymore */
+		if (tid == 0 && jobs == 0)
+			THREAD_WANT_SYNC();
 
 		/* expire immediately if events are pending */
 		exp = now_ms;
@@ -2431,7 +2436,9 @@ static void run_poll_loop()
 
 
 		/* Synchronize all polling loops */
-		sync_poll_loop();
+		if (sync_poll_loop())
+			break;
+
 		activity[tid].loops++;
 	}
 }
-- 
2.17.1



[PATCH] REGTEST: stick-tables: Test expiration when used with table_*

2018-06-20 Thread Daniel Corbett

Hello,

Thanks for adding this integration Fred.  Great job!

Attached is a new regression test to check for stick-tables expiration when
they are used with table_* converters as noted in commit id:
3e60b11100cbc812b77029ca142b83ac7a314db1


Thanks,

-- Daniel

>From 386ed3ca039ea2b5dec7397ba9934576217a421e Mon Sep 17 00:00:00 2001
From: Daniel Corbett 
Date: Wed, 20 Jun 2018 10:16:16 -0400
Subject: [PATCH] REGTEST: stick-tables: Test expiration when used with table_*

New regression test to check for stick-tables expiration when
they are used with table_* converters as noted in commit id:
3e60b11100cbc812b77029ca142b83ac7a314db1.
---
 reg-tests/stick-tables/h0.vtc | 65 +++
 1 file changed, 65 insertions(+)
 create mode 100644 reg-tests/stick-tables/h0.vtc

diff --git a/reg-tests/stick-tables/h0.vtc b/reg-tests/stick-tables/h0.vtc
new file mode 100644
index 000..c4fe6fb
--- /dev/null
+++ b/reg-tests/stick-tables/h0.vtc
@@ -0,0 +1,65 @@
+# commit 3e60b11
+# BUG/MEDIUM: stick-tables: Decrement ref_cnt in table_* converters
+#
+# When using table_* converters ref_cnt was incremented
+# and never decremented causing entries to not expire.
+#
+# The root cause appears to be that stktable_lookup_key()
+# was called within all sample_conv_table_* functions which was
+# incrementing ref_cnt and not decrementing after completion.
+#
+# Added stktable_release() to the end of each sample_conv_table_*
+# function and reworked the end logic to ensure that ref_cnt is
+# always decremented after use.
+#
+# This should be backported to 1.8
+
+varnishtest "stick-tables: Test expirations when used with table_*"
+
+# As some macros for haproxy are used in this file, this line is mandatory.
+feature ignore_unknown_macro
+
+haproxy h1 -conf {
+# Configuration file of 'h1' haproxy instance.
+defaults
+mode   http
+timeout connect 5s
+timeout server  30s
+timeout client  30s
+
+frontend http1
+bind "fd@${my_frontend_fd}"
+stick-table size 1k expire 1ms type ip store conn_rate(10s),http_req_cnt,http_err_cnt,http_req_rate(10s),http_err_rate(10s),gpc0,gpc0_rate(10s),gpt0
+http-request track-sc0 req.hdr(X-Forwarded-For)
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_http_req_cnt(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_trackers(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),in_table(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_bytes_in_rate(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_bytes_out_rate(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_conn_cnt(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_conn_cur(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_conn_rate(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_gpt0(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_gpc0(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_gpc0_rate(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_http_err_cnt(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_http_err_rate(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_http_req_cnt(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_http_req_rate(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_kbytes_in(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_kbytes_out(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_server_id(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_sess_cnt(http1) -m int lt 0  }
+http-request redirect location https://www.haproxy.com/ if { req.hdr(X-Forwarded-For),table_sess_rate(http1) -m int lt 0  }
+http-request redirect location 

Re: remaining process after (seamless) reload

2018-06-20 Thread William Dauchy
Hello Christopher,

Thank you for the quick answer and the patch.

On Wed, Jun 20, 2018 at 11:32 AM Christopher Faulet  wrote:
> Here is a patch to avoid a thread to exit its polling loop while others
> are waiting in the sync point. It is a theoretical patch because I was
> not able to reproduce the bug.
> Could you check if it fixes it please ?

Unfortunately, it does not fix the issue; I am getting the same backtrace:
#0  0x55f30960a10b in thread_sync_barrier (barrier=0x55f309875528
) at src/hathreads.c:114
#1  thread_enter_sync () at src/hathreads.c:127
#2  0x55f3095b27a2 in sync_poll_loop () at src/haproxy.c:2376
#3  run_poll_loop () at src/haproxy.c:2433
#4  run_thread_poll_loop (data=data@entry=0x55f31c4b0c50) at src/haproxy.c:2463
#5  0x55f30952e856 in main (argc=, argv=) at src/haproxy.c:3065

however the backtrace is different on some machines:

#0  0x7f31fd89ff57 in pthread_join () from /lib64/libpthread.so.0
#1  0x55e184a5486a in main (argc=, argv=) at src/haproxy.c:3069

-- 
William



Re: remaining process after (seamless) reload

2018-06-20 Thread Christopher Faulet

Le 19/06/2018 à 16:42, William Dauchy a écrit :

On Tue, Jun 19, 2018 at 4:30 PM William Lallemand
 wrote:

That's interesting, we can suppose that this bug is not related anymore to the
signal problem we had previously.
Looks like it's blocking in the thread sync point.
Are you able to do a backtrace with gdb? that could help a lot.


yes, sorry, forgot to paste it.

#0  0x56269318f0fb in thread_sync_barrier (barrier=0x5626933fa528
) at src/hathreads.c:112
#1  thread_enter_sync () at src/hathreads.c:125
#2  0x5626931377a2 in sync_poll_loop () at src/haproxy.c:2376
#3  run_poll_loop () at src/haproxy.c:2433
#4  run_thread_poll_loop (data=data@entry=0x5626a7aa1000) at src/haproxy.c:2463
#5  0x5626930b3856 in main (argc=, argv=) at src/haproxy.c:3065



Hi William(s),

Here is a patch to avoid a thread to exit its polling loop while others 
are waiting in the sync point. It is a theoretical patch because I was 
not able to reproduce the bug.


Could you check if it fixes it please ?

Thanks,
--
Christopher Faulet
>From 3576ecdfe108b07c20173d3d82dce5370e796742 Mon Sep 17 00:00:00 2001
From: Christopher Faulet 
Date: Wed, 20 Jun 2018 11:05:03 +0200
Subject: [PATCH] BUG/MEDIUM: threads: Increase jobs when threads must reach
 the sync point

When a thread want to pass in the sync point, the jobs number is increased. It's
only done for the first thread requesting the sync point. The jobs number is
decreased when the last thread exits from the sync point.

This is mandatory to avoid a thread to stop its polling loop while it must enter
in the sync point. It is really hard to figure out how to hit the bug. But, in
theory, it is possible for a thread to ask for a synchronization during the
HAProxy shutdown. In this case, we can imagine some threads waiting in the sync
point while anothers are stopping all jobs (listeners, peers...). So a thread
could exit from its polling loop without passing by the sync point, blocking all
others in the sync point and finally letting the process stalled and consuming
all the cpu.

This patch must be backported in 1.8.
---
 src/hathreads.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/hathreads.c b/src/hathreads.c
index 5db3c2197..295a0b304 100644
--- a/src/hathreads.c
+++ b/src/hathreads.c
@@ -71,8 +71,10 @@ void thread_want_sync()
 	if (all_threads_mask) {
 		if (threads_want_sync & tid_bit)
 			return;
-		if (HA_ATOMIC_OR(_want_sync, tid_bit) == tid_bit)
+		if (HA_ATOMIC_OR(_want_sync, tid_bit) == tid_bit) {
+			HA_ATOMIC_ADD(, 1);
 			shut_your_big_mouth_gcc(write(threads_sync_pipe[1], "S", 1));
+		}
 	}
 	else {
 		threads_want_sync = 1;
@@ -142,6 +144,7 @@ void thread_exit_sync()
 		char c;
 
 		shut_your_big_mouth_gcc(read(threads_sync_pipe[0], , 1));
+		HA_ATOMIC_SUB(, 1);
 		fd_done_recv(threads_sync_pipe[0]);
 	}
 
-- 
2.17.1



Re: [PATCH] MINOR: reg-tests: Add a few regression testing files.

2018-06-20 Thread Willy Tarreau
On Wed, Jun 20, 2018 at 10:19:16AM +0200, Frederic Lecaille wrote:
> Ok why not REGTEST (=> new patch for CONTRIBUTING).

OK now merged, thank you.

Willy



Re: [PATCH] MINOR: reg-tests: Add a few regression testing files.

2018-06-20 Thread Frederic Lecaille

On 06/20/2018 10:06 AM, Willy Tarreau wrote:

On Wed, Jun 20, 2018 at 09:47:25AM +0200, Frederic Lecaille wrote:

Here is a few more reg testing files with a special one for Olivier (see
reg-tests/seamless-reload/h0.vtc).


Merged, thanks Fred.

I think we should start to think about using a new tag for patches about
regtests. We already use "TESTS" from time to time for the tests directory
but it contains a bit of whatever. Maybe something short like "REGTEST"
could be easy to spot in logs ? Or feel free to suggest and use anything
else that you see fit.


Ok why not REGTEST (=> new patch for CONTRIBUTING).
>From ac705a3bcc91fcd02e4cc4497f32e9d1641d5c27 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20L=C3=A9caille?= 
Date: Wed, 20 Jun 2018 10:14:01 +0200
Subject: [PATCH] DOC: Add new REGTEST tag info about reg testing.

---
 CONTRIBUTING | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTING b/CONTRIBUTING
index 9e0d625..b5ba182 100644
--- a/CONTRIBUTING
+++ b/CONTRIBUTING
@@ -566,6 +566,9 @@ patch types include :
 
   - LICENSE  licensing updates (may impact distro packagers).
 
+  - REGTEST  updates to any of the regression testing files found in reg-tests
+ directory, including README or any documentation file.
+
 
 When the patch cannot be categorized, it's best not to put any type tag, and to
 only use a risk or complexity information only as below. This is commonly the
-- 
2.1.4



Re: [PATCH] MINOR: reg-tests: Add a few regression testing files.

2018-06-20 Thread Willy Tarreau
On Wed, Jun 20, 2018 at 09:47:25AM +0200, Frederic Lecaille wrote:
> Here is a few more reg testing files with a special one for Olivier (see
> reg-tests/seamless-reload/h0.vtc).

Merged, thanks Fred.

I think we should start to think about using a new tag for patches about
regtests. We already use "TESTS" from time to time for the tests directory
but it contains a bit of whatever. Maybe something short like "REGTEST"
could be easy to spot in logs ? Or feel free to suggest and use anything
else that you see fit.

Willy



[PATCH] MINOR: reg-tests: Add a few regression testing files.

2018-06-20 Thread Frederic Lecaille

On 06/19/2018 10:18 AM, Willy Tarreau wrote:

On Mon, Jun 18, 2018 at 07:50:38PM +0200, Frederic Lecaille wrote:

Hello,

Here is a simple patch to add a Makefile target to run all "*.vtc"
regression testing files found in 'reg-tests' directory.

(...)

Thank you very much for this, Fred! I hope this will ignite a long
series of such tests. It may be useful to add a README in the reg-tests
directory indicating how to install varnishtest (especially the version
compatible with haproxy), and probably suggest contributing new tests
in the CONTRIBUTING file to encourage the practice.

Willy



Here is a few more reg testing files with a special one for Olivier (see 
reg-tests/seamless-reload/h0.vtc).


Fred.
>From a95e9c2caf10e2d6393b6ba38269e0afdfc71d64 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20L=C3=A9caille?= 
Date: Wed, 20 Jun 2018 07:26:44 +0200
Subject: [PATCH] MINOR: reg-tests: Add a few regression testing files.

---
 reg-tests/log/h0.vtc | 62 ++
 reg-tests/seamless-reload/h0.vtc | 43 
 reg-tests/spoe/h0.vtc| 18 ++
 reg-tests/ssl/README |  2 ++
 reg-tests/ssl/common.pem | 65 
 reg-tests/ssl/h0.vtc | 48 ++
 6 files changed, 238 insertions(+)
 create mode 100644 reg-tests/log/h0.vtc
 create mode 100644 reg-tests/seamless-reload/h0.vtc
 create mode 100644 reg-tests/spoe/h0.vtc
 create mode 100644 reg-tests/ssl/README
 create mode 100644 reg-tests/ssl/common.pem
 create mode 100644 reg-tests/ssl/h0.vtc

diff --git a/reg-tests/log/h0.vtc b/reg-tests/log/h0.vtc
new file mode 100644
index 000..f0ab7ea
--- /dev/null
+++ b/reg-tests/log/h0.vtc
@@ -0,0 +1,62 @@
+# commit d02286d
+# BUG/MINOR: log: pin the front connection when front ip/ports are logged
+#
+# Mathias Weiersmueller reported an interesting issue with logs which Lukas
+# diagnosed as dating back from commit 9b061e332 (1.5-dev9). When front
+# connection information (ip, port) are logged in TCP mode and the log is
+# emitted at the end of the connection (eg: because %B or any log tag
+# requiring LW_BYTES is set), the log is emitted after the connection is
+# closed, so the address and ports cannot be retrieved anymore.
+#
+# It could be argued that we'd make a special case of these to immediatly
+# retrieve the source and destination addresses from the connection, but it
+# seems cleaner to simply pin the front connection, marking it "tracked" by
+# adding the LW_XPRT flag to mention that we'll need some of these elements
+# at the last moment. Only LW_FRTIP and LW_CLIP are affected. Note that after
+# this change, LW_FRTIP could simply be removed as it's not used anywhere.
+#
+# Note that the problem doesn't happen when using %[src] or %[dst] since
+# all sample expressions set LW_XPRT.
+
+varnishtest "Wrong ip/port logging"
+feature ignore_unknown_macro
+
+server s1 {
+rxreq
+txresp
+} -start
+
+syslog Slg_1 -level notice {
+recv
+recv
+recv info
+expect ~ \"dip\":\"${h1_fe_1_addr}\",\"dport\":\"${h1_fe_1_port}.*\"ts\":\"cD\",\"
+} -start
+
+haproxy h1 -conf {
+global
+log ${Slg_1_addr}:${Slg_1_port} local0
+
+defaults
+log global
+timeout connect 3000
+timeout client 5
+timeout server  1
+
+frontend fe1
+bind "fd@${fe_1}"
+mode tcp
+log-format {\"dip\":\"%fi\",\"dport\":\"%fp\",\"c_ip\":\"%ci\",\"c_port\":\"%cp\",\"fe_name\":\"%ft\",\"be_name\":\"%b\",\"s_name\":\"%s\",\"ts\":\"%ts\",\"bytes_read\":\"%B\"}
+default_backendbe_app
+
+backend be_app
+server app1 ${s1_addr}:${s1_port} check
+} -start
+
+client c1 -connect ${h1_fe_1_sock} {
+txreq -url "/"
+delay 0.02
+} -run
+
+syslog Slg_1 -wait
+
diff --git a/reg-tests/seamless-reload/h0.vtc b/reg-tests/seamless-reload/h0.vtc
new file mode 100644
index 000..498e0c6
--- /dev/null
+++ b/reg-tests/seamless-reload/h0.vtc
@@ -0,0 +1,43 @@
+# commit b4dd15b
+# BUG/MINOR: unix: Make sure we can transfer abns sockets on seamless reload.
+#
+# When checking if a socket we got from the parent is suitable for a listener,
+# we just checked that the path matched sockname.tmp, however this is
+# unsuitable for abns sockets, where we don't have to create a temporary
+# file and rename it later.
+# To detect that, check that the first character of the sun_path is 0 for
+# both, and if so, that _path[1] is the same too.
+
+varnishtest "Seamless reload issue with abns sockets"
+feature ignore_unknown_macro
+
+haproxy h1 -W -conf {
+  global
+stats socket ${tmpdir}/h1/stats level admin expose-fd listeners
+
+  defaults
+mode http
+log global
+option httplog
+timeout connect 15ms
+timeout client  20ms
+timeout server  20ms
+
+  listen testme
+bind "fd@${testme}"
+server test_abns_server abns@wpproc1