Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-10-12 Thread Willy Tarreau
On Fri, Oct 12, 2018 at 02:29:51PM +0200, PiBa-NL wrote:
> Op 12-10-2018 om 10:53 schreef William Lallemand:
> > The attached patch should fix the issue.
> 
> The patch works for me, thanks.

Great, patch now merged.

Thanks!
Willy



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-10-12 Thread PiBa-NL

Hi William,

Op 12-10-2018 om 10:53 schreef William Lallemand:

The attached patch should fix the issue.


The patch works for me, thanks.

Regards,

PiBa-NL (Pieter)




Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-10-12 Thread William Lallemand
On Thu, Oct 11, 2018 at 11:06:38PM +0200, William Lallemand wrote:
> On Thu, Oct 11, 2018 at 09:31:48PM +0200, PiBa-NL wrote:
> > Hi Willy, William,
> > 
> 
> Hi Peter,
> 
> Regarding this part:
>  
> > -Connection and request counters to low when ran as regtest from 
> > varnishtest (bug?)
> > It turns out that starting haproxy from varnishtest, and using -W 
> > master-worker mode, actually creates 2 processes that are handling 
> > traffic. That explains that a large part of connections isn't seen by 
> > the other haproxy instance and stats showing to low amounts of 
> > connections. Bisecting it seems to fail on this commit: b3f2be3 , 
> > perhaps @William can you take a look at it? Not really sure when this 
> > occurs in a 'real' environment, it doesn't seem to happen when manually 
> > running haproxy -W, but still its strange that when varnisttest is 
> > calling haproxy this occurs.
> > 
> 
> There was an exception in the master's code regarding the inherited FDs (fd@),
> which is the case with varnishtest. There is a flag which was forbidding the
> master to close the FD in 1.8.
> 
> Now there is a polling loop in the master and I think there is a side effect
> where the FD is registered in the master polling loop.
> 
> It's just a guess, I'll look at it tomorrow.
> 
> Cheers,
> 

The attached patch should fix the issue.

-- 
William Lallemand
>From 3f2c30a0f15e30b7941245d3c5b20cbd5561489f Mon Sep 17 00:00:00 2001
From: William Lallemand 
Date: Fri, 12 Oct 2018 10:39:54 +0200
Subject: [PATCH] BUG/MEDIUM: mworker: don't poll on LI_O_INHERITED listeners

The listeners with the LI_O_INHERITED flag were deleted but not unbound
which is a problem since we have a polling in the master.

This patch unbind every listeners which are not require for the master,
but does not close the FD of those that have a LI_O_INHERITED flag.
---
 src/haproxy.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/haproxy.c b/src/haproxy.c
index a7b07a267..82da86222 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -615,13 +615,17 @@ static void mworker_cleanlisteners()
 
 	for (curproxy = proxies_list; curproxy; curproxy = curproxy->next) {
 		list_for_each_entry_safe(l, l_next, >conf.listeners, by_fe) {
-			/* does not close if the FD is inherited with fd@
-			 * from the parent process */
-			if (!(l->options & (LI_O_INHERITED|LI_O_MWORKER)))
-unbind_listener(l);
 			/* remove the listener, but not those we need in the master... */
-			if (!(l->options & LI_O_MWORKER))
+			if (!(l->options & LI_O_MWORKER)) {
+/* unbind the listener but does not close if
+   the FD is inherited with fd@ from the parent
+   process */
+if (l->options & LI_O_INHERITED)
+	unbind_listener_no_close(l);
+else
+	unbind_listener(l);
 delete_listener(l);
+			}
 		}
 	}
 }
-- 
2.16.4



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-10-11 Thread William Lallemand
On Thu, Oct 11, 2018 at 09:31:48PM +0200, PiBa-NL wrote:
> Hi Willy, William,
> 

Hi Peter,

Regarding this part:
 
> -Connection and request counters to low when ran as regtest from 
> varnishtest (bug?)
> It turns out that starting haproxy from varnishtest, and using -W 
> master-worker mode, actually creates 2 processes that are handling 
> traffic. That explains that a large part of connections isn't seen by 
> the other haproxy instance and stats showing to low amounts of 
> connections. Bisecting it seems to fail on this commit: b3f2be3 , 
> perhaps @William can you take a look at it? Not really sure when this 
> occurs in a 'real' environment, it doesn't seem to happen when manually 
> running haproxy -W, but still its strange that when varnisttest is 
> calling haproxy this occurs.
> 

There was an exception in the master's code regarding the inherited FDs (fd@),
which is the case with varnishtest. There is a flag which was forbidding the
master to close the FD in 1.8.

Now there is a polling loop in the master and I think there is a side effect
where the FD is registered in the master polling loop.

It's just a guess, I'll look at it tomorrow.

Cheers,

-- 
William Lallemand



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-10-11 Thread PiBa-NL

Hi Willy, William,


Op 2-10-2018 om 3:56 schreef Willy Tarreau:

it's important to cut this into pieces
to figure what it's showing.


A little update on this issue split in 2 pieces.

-Connection and request counters to low when ran as regtest from 
varnishtest (bug?)
It turns out that starting haproxy from varnishtest, and using -W 
master-worker mode, actually creates 2 processes that are handling 
traffic. That explains that a large part of connections isn't seen by 
the other haproxy instance and stats showing to low amounts of 
connections. Bisecting it seems to fail on this commit: b3f2be3 , 
perhaps @William can you take a look at it? Not really sure when this 
occurs in a 'real' environment, it doesn't seem to happen when manually 
running haproxy -W, but still its strange that when varnisttest is 
calling haproxy this occurs.


-Request counter to high (possibly a improvement request?)
The http_end_txn_clean_session(..) is called which increments the 
counter on a finished http response, and i was testing with 2 different 
methods for 1.8 and 1.9 due to missing length converter i used in my 1.9 
test, which makes the comparison unfair. Sorry i didn't realize this 
earlier i thought it did 'more or less' the same, that seems to have 
been 'less'. Together with that i found the numbers odd/unexpected i 
assumed a bigger problem that it actually seems to be, perhaps its not 
even that bad, haproxy is 'preparing' for a second request over the 
keep-alive connection if i understand correctly. Which eventually 
doesn't happen, but is counted. Maybe that is a point that can be 
improved in a future version if time permits.? Or would it even be 
expected to behave like that?


Regards,
PiBa-NL (Pieter)




Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-10-04 Thread Willy Tarreau
Hi Pieter,

On Thu, Oct 04, 2018 at 09:43:36PM +0200, PiBa-NL wrote:
> Hi Willy,
> 
> Op 2-10-2018 om 3:56 schreef Willy Tarreau:
> > it's important to cut this into pieces
> > to figure what it's showing.
> 
> Okay, the good thing i suppose we already have a reproduction sort-off..
> 
> What would be the best way to try and get to the bottom of this? Add debug
> output to haproxy code? Get some kind of trace? Anything in particular that
> would be of interest?

If you manage to figure a way to reproduce it with the "connection: close"
header, that's one different bug from what I observed. However to be honest,
given how much work we still have for the rework of the HTTP engine, for now
I'd rather continue to work on the stuff we still have to do than spend one
week figuring why this counter is sometimes wrong, so if you or anyone else
wants to dig this one down to the root cause, I would really appreciate it.

Thanks!
Willy



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-10-04 Thread PiBa-NL

Hi Willy,

Op 2-10-2018 om 3:56 schreef Willy Tarreau:

it's important to cut this into pieces
to figure what it's showing.


Okay, the good thing i suppose we already have a reproduction sort-off..

What would be the best way to try and get to the bottom of this? Add 
debug output to haproxy code? Get some kind of trace? Anything in 
particular that would be of interest?


Regards,

PiBa-NL (Pieter)




Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-10-01 Thread Willy Tarreau
Hi Pieter,

On Mon, Oct 01, 2018 at 11:06:33PM +0200, PiBa-NL wrote:
> I have used the exact config above, and well it 'works fine' ?? Results
> below.. both types of nc requests send to 1 running haproxy result in a
> proper '21'..

OK.

> But even in this case the results would be 'predicable', while with my
> original test the outcome is 'randomized'..
> So despite the fact that i don't know whats really wrong, and the
> environment it runs on does seem to affect things. There is more going on
> that simply a counter incremented twice.. Also your results are 'higher'
> than expected, while mine are 'lower' than expected.. Perhaps this single
> testcase is already showing multiple issues beneath the carpet ;) ?.

It's very likely :-/ That's why it's important to cut this into pieces
to figure what it's showing.

Willy



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-10-01 Thread PiBa-NL

Hi Frederic,

Op 1-10-2018 om 15:15 schreef Frederic Lecaille:

On 09/15/2018 02:03 AM, PiBa-NL wrote:

Hi List, Willy,


Hi Pieter,

Sorry for this late reply.

I am also sorry to tell you that -D option should not be used anymore 
because it as been recently broken. It adds an extra 2s delay on my PC.
There is a 'newer' version of the regtest in one of the other mails ( 
https://www.mail-archive.com/haproxy@formilux.org/msg31318.html ). it 
only runs 4x10 connections. and doesn't use -D anymore and well 
discussing with Willy some odd behavioral differences between his and my 
test results..


Without this option, on my PC the test fails but after having being 
run during about 300ms due to the regex which does not match. 
CummConns (resp. CumReq) never reaches 3001 (resp. 3002).


Note that as this script is highly verbose it may be required to 
increase the varnishtest internal buffer size (see -b varnishtest 
option).
Thanks didn't realize that one could have helped, probably not needed 
anymore with the changed regtest though.


I've created a regtest that checks that when concurrent connections 
are being handled that the connection counters are kept properly.


I think it could be committed as attached. It takes a few seconds to 
run. It currently fails on 1.9-dev2 (also fails on 1.8.13 with kqueue 
on FreeBSD, adding a 'nokqueue' on 1.8.13 makes it succeed though..).


I think it might be a good and reproducible test to run.?

Or does it need more tweaking? Thoughts appreciated :).


I have shorten the "haproxy -cli" section like that:

haproxy h1 -cli {
    send "show info"
    expect ~ "CumConns: 3001*\\nCumReq: 3002*"
} -wait
Thats a nice one, i did try a .* regex like wildcard check but found 
that didn't work, this will certainly help in case the test fails to 
produce less repeated/unneeded output and when it doesn't fail it will 
be a bit faster i guess, and allow for more checks on the same set of 
stats to take place with little effort.


Note that all the Slg_1 syslog specs are not completely run because 
you do not wait for its termination with "syslog Slg_1 -wait" command 
this is why it does not fails (but it should: the syslog server does 
not receive 15 messages).
I didn't intent to check for 15 messages as a valid outcome, but was 
looking more for 'some' syslog output when the testcase fails for some 
reason.. With the now fairly limited 40 connections it might be feasible 
to more properly check syslog output though. With 3000 connections it 
wasn't really nice when syslog was writing everything to the screen. Ill 
take another look at that one.


Regards,
PiBa-NL (Pieter)



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-10-01 Thread PiBa-NL

Hi Willy,

Op 1-10-2018 om 4:00 schreef Willy Tarreau:

On Sun, Sep 30, 2018 at 10:16:55PM +0200, PiBa-NL wrote:

Hi Willy,
Op 30-9-2018 om 20:38 schreef Willy Tarreau:

On Sun, Sep 30, 2018 at 08:22:23PM +0200, Willy Tarreau wrote:

On Sun, Sep 30, 2018 at 07:59:34PM +0200, PiBa-NL wrote:

Indeed it works with 1.8, so in that regard i 'think' the test itself is
correct.. Also when disabling threads, or running only 1 client, it still
works.. Then both CumConns CumReq show 11 for the first stats result.

Hmmm for me it fails even without threads. That was the first thing I
tried when meeting the error in fact. But I need to dig deeper.

So I'm seeing that in fact the count is correct if the server connection
closes first, and wrong otherwise. In fact it fails similarly both for
1.6, 1.7, 1.8 and 1.9 with and without threads. I'm seeing that the
connection count is exactly 10 times the incoming connections while the
request count is exactly 20 times this count. I suspect that what happens
is that the request count is increased on each connection when preparing
to receive a new request. This even slightly reminds me something but
I don't know where I noticed something like this, I think I saw this
when reviewing the changes needed to be made to HTTP for the native
internal representation.

So I think it's a minor bug, but not a regression.

Thanks,
Willy

Not sure, only difference between 100x FAILED and 100x OK is the version
here. Command executed and result below.

Perhaps that's just because of the OS / Scheduler used though, i assume your
using some linux distro to test with, perhaps that explains part of the
differences between your and my results..

I find this strange. Your config contains a comment about the length
converter missing from 1.8 so I had to change it to use the deny part
on 1.8. It happens that using deny here precisely is what fixed the
problem for me the first time. I simplified it this way to run a
manual test :

   global
 stats socket /tmp/sock1 mode 666 level admin
 #nbthread 3
 log 127.0.0.1:5514 local0
 #nokqueue

   defaults
 mode http
 option dontlog-normal
 log global
 option httplog
 timeout connect 3s
 timeout client  4s
 timeout server  15s

   frontend fe1
 bind 127.0.0.1:8001
 acl donelooping hdr(TEST) -m len 10
 http-request set-header TEST "%[hdr(TEST)]x"
 use_backend b2 if donelooping
 default_backend b1

   backend b1
 server srv1 127.0.0.1:8001

   frontend fe2
 bind 127.0.0.1:8002
 default_backend b2

   backend b2
 server srv2 127.0.0.1:8000

Then I test it this way and got the same results on all versions :

   $ echo -e "GET / HTTP/1.1\r\n\r\n"|nc 0 8001
   $ echo show info | socat - /tmp/sock1 | grep Cum
   CumConns: 11
   CumReq: 21
   CumSslConns: 0

   $ echo -e "GET / HTTP/1.1\r\nconnection: close\r\n\r\n"|nc 0 8001
   $ echo show info | socat - /tmp/sock1 | grep Cum
   CumConns: 11
   CumReq: 11
   CumSslConns: 0

Regards,
Willy


I have used the exact config above, and well it 'works fine' ?? Results 
below.. both types of nc requests send to 1 running haproxy result in a 
proper '21'..


But even in this case the results would be 'predicable', while with my 
original test the outcome is 'randomized'..
So despite the fact that i don't know whats really wrong, and the 
environment it runs on does seem to affect things. There is more going 
on that simply a counter incremented twice.. Also your results are 
'higher' than expected, while mine are 'lower' than expected.. Perhaps 
this single testcase is already showing multiple issues beneath the 
carpet ;) ?.


Regards,
PiBa-NL (Pieter)

# haproxy -v
HA-Proxy version 1.8.14-52e4d43 2018/09/20
Copyright 2000-2018 Willy Tarreau 

# echo -e "GET / HTTP/1.1\r\n\r\n"|nc 0 8001
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Content-Type: text/html

503 Service Unavailable
No server is available to handle this request.

# echo -e "GET / HTTP/1.1\r\nconnection: close\r\n\r\n"|nc 0 8001
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Content-Type: text/html

503 Service Unavailable
No server is available to handle this request.

# echo show info | socat - /tmp/sock1 | grep Cum
CumConns: 21
CumReq: 21
CumSslConns: 0
#
#
# haproxy -v
HA-Proxy version 1.9-dev3-27010f0 2018/09/29
Copyright 2000-2018 Willy Tarreau 

# echo -e "GET / HTTP/1.1\r\n\r\n"|nc 0 8001
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Content-Type: text/html

503 Service Unavailable
No server is available to handle this request.

# echo -e "GET / HTTP/1.1\r\nconnection: close\r\n\r\n"|nc 0 8001
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Content-Type: text/html

503 Service Unavailable
No server is available to handle this request.

# echo show info | socat - /tmp/sock1 | grep Cum
CumConns: 21
CumReq: 21
CumSslConns: 0
#




Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-10-01 Thread Frederic Lecaille

On 09/15/2018 02:03 AM, PiBa-NL wrote:

Hi List, Willy,


Hi Pieter,

Sorry for this late reply.

I am also sorry to tell you that -D option should not be used anymore 
because it as been recently broken. It adds an extra 2s delay on my PC.


Without this option, on my PC the test fails but after having being run 
during about 300ms due to the regex which does not match. CummConns 
(resp. CumReq) never reaches 3001 (resp. 3002).


Note that as this script is highly verbose it may be required to 
increase the varnishtest internal buffer size (see -b varnishtest option).


I've created a regtest that checks that when concurrent connections are 
being handled that the connection counters are kept properly.


I think it could be committed as attached. It takes a few seconds to 
run. It currently fails on 1.9-dev2 (also fails on 1.8.13 with kqueue on 
FreeBSD, adding a 'nokqueue' on 1.8.13 makes it succeed though..).


I think it might be a good and reproducible test to run.?

Or does it need more tweaking? Thoughts appreciated :).


I have shorten the "haproxy -cli" section like that:

haproxy h1 -cli {
send "show info"
expect ~ "CumConns: 3001*\\nCumReq: 3002*"
} -wait

Note that all the Slg_1 syslog specs are not completely run because you 
do not wait for its termination with "syslog Slg_1 -wait" command this 
is why it does not fails (but it should: the syslog server does not 
receive 15 messages).



Regards,

PiBa-NL (Pieter)






Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-30 Thread PiBa-NL

Hi Willy,
Op 30-9-2018 om 20:38 schreef Willy Tarreau:

On Sun, Sep 30, 2018 at 08:22:23PM +0200, Willy Tarreau wrote:

On Sun, Sep 30, 2018 at 07:59:34PM +0200, PiBa-NL wrote:

Indeed it works with 1.8, so in that regard i 'think' the test itself is
correct.. Also when disabling threads, or running only 1 client, it still
works.. Then both CumConns CumReq show 11 for the first stats result.

Hmmm for me it fails even without threads. That was the first thing I
tried when meeting the error in fact. But I need to dig deeper.

So I'm seeing that in fact the count is correct if the server connection
closes first, and wrong otherwise. In fact it fails similarly both for
1.6, 1.7, 1.8 and 1.9 with and without threads. I'm seeing that the
connection count is exactly 10 times the incoming connections while the
request count is exactly 20 times this count. I suspect that what happens
is that the request count is increased on each connection when preparing
to receive a new request. This even slightly reminds me something but
I don't know where I noticed something like this, I think I saw this
when reviewing the changes needed to be made to HTTP for the native
internal representation.

So I think it's a minor bug, but not a regression.

Thanks,
Willy


Not sure, only difference between 100x FAILED and 100x OK is the version 
here. Command executed and result below.


Perhaps that's just because of the OS / Scheduler used though, i assume 
your using some linux distro to test with, perhaps that explains part of 
the differences between your and my results.. In the end it doesn't 
matter much if its a bug or a regression still needs a fix ;). And well 
i don't know if its just the counter thats wrong, or there might be 
bigger consequences somewhere. if its just the counter then i guess it 
wouldn't hurt much to postpone a fix to a next (dev?) version.


Regards,

PiBa-NL (Pieter)

root@freebsd11:/usr/ports/net/haproxy-devel # varnishtest -q -n 100 -j 
16 -k ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc

...
#    top  TEST ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc 
FAILED (0.128) exit=2
#    top  TEST ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc 
FAILED (0.135) exit=2

100 tests failed, 0 tests skipped, 0 tests passed
root@freebsd11:/usr/ports/net/haproxy-devel # haproxy -v
HA-Proxy version 1.9-dev3-27010f0 2018/09/29
Copyright 2000-2018 Willy Tarreau 

root@freebsd11:/usr/ports/net/haproxy-devel # pkg add -f 
haproxy-1.8.14-selfbuild-reg-tests-OK.txz

Installing haproxy-1.8...
package haproxy is already installed, forced install
Extracting haproxy-1.8: 100%
root@freebsd11:/usr/ports/net/haproxy-devel # varnishtest -q -n 100 -j 
16 -k ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc

0 tests failed, 0 tests skipped, 100 tests passed
root@freebsd11:/usr/ports/net/haproxy-devel # haproxy -v
HA-Proxy version 1.8.14-52e4d43 2018/09/20
Copyright 2000-2018 Willy Tarreau 






Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-30 Thread Willy Tarreau
On Sun, Sep 30, 2018 at 08:22:23PM +0200, Willy Tarreau wrote:
> On Sun, Sep 30, 2018 at 07:59:34PM +0200, PiBa-NL wrote:
> > Indeed it works with 1.8, so in that regard i 'think' the test itself is
> > correct.. Also when disabling threads, or running only 1 client, it still
> > works.. Then both CumConns CumReq show 11 for the first stats result.
> 
> Hmmm for me it fails even without threads. That was the first thing I
> tried when meeting the error in fact. But I need to dig deeper.

So I'm seeing that in fact the count is correct if the server connection
closes first, and wrong otherwise. In fact it fails similarly both for
1.6, 1.7, 1.8 and 1.9 with and without threads. I'm seeing that the
connection count is exactly 10 times the incoming connections while the
request count is exactly 20 times this count. I suspect that what happens
is that the request count is increased on each connection when preparing
to receive a new request. This even slightly reminds me something but
I don't know where I noticed something like this, I think I saw this
when reviewing the changes needed to be made to HTTP for the native
internal representation.

So I think it's a minor bug, but not a regression.

Thanks,
Willy



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-30 Thread Willy Tarreau
On Sun, Sep 30, 2018 at 07:59:34PM +0200, PiBa-NL wrote:
> Indeed it works with 1.8, so in that regard i 'think' the test itself is
> correct.. Also when disabling threads, or running only 1 client, it still
> works.. Then both CumConns CumReq show 11 for the first stats result.

Hmmm for me it fails even without threads. That was the first thing I
tried when meeting the error in fact. But I need to dig deeper.

> > However, I'd like to merge
> > the fix before merging the regtest otherwise it will kill the reg-test
> > feature until we manage to get the issue fixed!
> I'm not fully sure i agree on that.. While i understand that failing
> reg-tests can be a pita while developing (if you run them regulary) the fact
> is that currently existing tests can already already start to fail after
> some major redesign of the code, a few mails back (different mailthread) i
> tested like 10 commits in a row and they all suffered from different failing
> tests, that would imho not be a reason to remove those tests, and they didnt
> stop development.

The reason is that for now we have no way to let the tests fail gracefully
and report what is OK and what is not. So any error that's in the way will
lead to an absolutely certain behaviour from everyone : nobody will run the
tests anymore since the result will be known.

Don't get me wrong, I'm willing to get as many tests as we can, but 1) we
have to be sure these tests only fail for regressions and not for other
reasons, and 2) we must be sure that these tests do not prevent other ones
from being run nor make it impossible to observe the progress on other
ones. We're still at the beginning with reg tests, and as you can see we
have not even yet sorted out the requirements for some of them like threads
or Lua or whatever else.

I'm just asking that we don't create tests faster than we can sort them
out, that's all. This probably means that we really have to work on these
two main areas which are test prerequisites and synthetic reports of what
worked and what failed.

Ideas and proposals on this are welcome, but to be honest I can't spend
as much time as I'd want on this for now given how late we are on all
what remains to be done, so I really welcome discussions and help on the
subject between the various actors.

Thanks,
Willy



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-30 Thread Willy Tarreau
On Sun, Sep 30, 2018 at 07:15:59PM +0200, PiBa-NL wrote:
> > on a simple config, the CummConns always matches the CumReq, and when
> > running this test I'm seeing random values there in the output, but I
> > also see that they are retrieved before all connections are closed
> But CurrConns is 0, so connections are (supposed to be?) closed? :
> 
>  h1    0.0 CLI recv|CurrConns: 0
>  h1    0.0 CLI recv|CumConns: 27
>  h1    0.0 CLI recv|CumReq: 27

You're totally right, I think I confused CumConns and CurrConns when
looking at the output. With that said I have no idea what's going on,
I'll have another look.

Thanks,
Willy



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-30 Thread PiBa-NL

Hi Willy,

Op 30-9-2018 om 7:46 schreef Willy Tarreau:

Hi Pieter,

On Sun, Sep 30, 2018 at 12:05:14AM +0200, PiBa-NL wrote:

Hi Willy,

I thought lets give those reg-test another try :) as its easy to run and
dev3 just came out.
All tests pass on my FreeBSD system, except this one, new reg-test attached.

Pretty much the same test as previously send, but now with only 4 x 10
connections. Which should be fine for conntrack and sysctls (i hope..). It
seems those stats numbers are 'off', or is my expected value not as fixed as
i thought it would be?

Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you
spotted a regression that we have to analyse.
Indeed it works with 1.8, so in that regard i 'think' the test itself is 
correct.. Also when disabling threads, or running only 1 client, it 
still works.. Then both CumConns CumReq show 11 for the first stats result.

However, I'd like to merge
the fix before merging the regtest otherwise it will kill the reg-test
feature until we manage to get the issue fixed!
I'm not fully sure i agree on that.. While i understand that failing 
reg-tests can be a pita while developing (if you run them regulary) the 
fact is that currently existing tests can already already start to fail 
after some major redesign of the code, a few mails back (different 
mailthread) i tested like 10 commits in a row and they all suffered from 
different failing tests, that would imho not be a reason to remove those 
tests, and they didnt stop development.

I'm also seeing that you rely on threads, I think I noticed another test
involving threads. Probably that we should have a specific directory for
these ones that we can disable completely when threads are not enabled,
otherwise this will also destroy tests (and make them extremely slow due
to varnishtest waiting for the timeout if haproxy refuses to parse the
config).
A specific directory will imho not work. How should it be called? 
/threaded_lua_with_ssl_using_kqueue_scheduler_on_freebsd_without_absn_for_haproxy_1.9_and_higher/ 
?
Having varnishtest fail while waiting for a feature that was not 
compiled is indeed undesirable as well. So some 'smart' way of defining 
'requirements' for a test will be needed so they can gracefully skip if 
not applicable.. I'm not sure myself how that way should look though.. 
On one side i think the .vtc itself might be the place to define what 
requirements it has, on the other the other a separate list/script 
including logic of what tests to run could be nice.. But then who is 
going to maintain that one..

I think that we should think a bit forward based on these tests. We must
not let varnishtest stop on the first error but rather just log it.

varnishtest can continue on error with -k
Using this little mytest.sh script at the moment, this runs all tests 
and only failed tests produce a lot of logging..:

  haproxy -v
  varnishtest -j 16 -k -t 20 ./work/haproxy-*/reg-tests/*/*.vtc > 
./mytest-result.log 2>&1
  varnishtest -j 16 -k -t 20 ./haproxy_test_OK_20180831/*/*.vtc >> 
./mytest-result.log 2>&1

  cat ./mytest-result.log
  echo "" >> ./mytest-result.log
  haproxy -vv  >> ./mytest-result.log

There is also the -q parameter, but then it doesn't log anymore what 
tests passed and would only the failed tests will produce 1 log line.. 
(i do like to log what tests where executed though..)

  Then
at the end we could produce a report of successes and failures that would
be easy to diff from the previous (or expected) one. That will be
particularly useful when running the tests on older releases. As an
example, I had to run your test manually on 1.8 because for I-don't-know-
what-reason, the one about the proxy protocol now fails while it used to
work fine last week for the 1.8.14 release. That's a shame that we can't
complete tests just because one randomly fails.
You can continue tests. ( -k ) But better write it out to a logfile 
then, or perhaps combine with -l which leaves the /tmp/.vtc folder..

Thanks,
Willy


Regards,
PiBa-NL (Pieter)




Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-30 Thread PiBa-NL

Hi Willy,

Op 30-9-2018 om 7:56 schreef Willy Tarreau:

On Sun, Sep 30, 2018 at 07:46:24AM +0200, Willy Tarreau wrote:

Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you
spotted a regression that we have to analyse. However, I'd like to merge
the fix before merging the regtest otherwise it will kill the reg-test
feature until we manage to get the issue fixed!

By the way, could you please explain in simple words the issue you've
noticed ? I tried to reverse the vtc file but I don't understand the
details nor what it tries to achieve. When I'm running a simple test
on a simple config, the CummConns always matches the CumReq, and when
running this test I'm seeing random values there in the output, but I
also see that they are retrieved before all connections are closed

But CurrConns is 0, so connections are (supposed to be?) closed? :

 h1    0.0 CLI recv|CurrConns: 0
 h1    0.0 CLI recv|CumConns: 27
 h1    0.0 CLI recv|CumReq: 27


, so
I'm not even sure the test is correct :-/

Thanks,
Willy


What i'm trying to achieve is, well.. testing for regressions that are 
not yet known to exist on the current stable version.


So what this test does in short:
It makes 4 clients simultaneously send a request to a threaded haproxy, 
which in turn connects 10x backend to frontend and then sends the 
request to the s1 server. This with the intended purpose of having 
several connections started and broken up as fast as haproxy can process 
them while trying to have a high probability of adding/removing items 
from lists/counters from different threads thus possibly creating 
problems if some lock/sync isn't done correctly. After firing a few 
requests it also verifies the expected counts, and results where possible..


History:
Ive been bit a few times with older releases by corruption occurring 
inside the POST data when uploading large (500MB+) files to a server 
behind haproxy. After a few megabytes are passed correctly the resulting 
file would contain differences from their original when compared, the 
upload 'seemed' to succeed though. (this was then solved by installing a 
newer haproxy build..).. Also sometimes threads have locked up or 
crashed things. Or kqueue scheduler turned out to behave differently 
than others.. Ive been trying to test such things manually but found i 
always forget to run some test. This is why i really like the concept of 
having a set of defined tests that validate haproxy is working 
'properly', on the OS i run it on.. Also when some issue i ran into gets 
fixed i tend to run -dev builds on my production environment for a 
while, and well its nice to know that other functionality still works as 
it used to..


With writing this test i initially started with the idea of 
automatically testing a large file transfer through haproxy, but then 
thought where / how to leave such a file, so i thought of transferring a 
'large' header with increasing size 'might' trigger a similar 
condition.. Though in hindsight that might not actually test the same 
code paths..


I created that test with 1 byte growth in the header together with 4000 
connections didn't quite achieve that initial big file simulation, but 
still i thought it ended up to be a nice test. So submitted it a while 
back ;) .. Anyhow haproxy wasn't capable of doing much when dev2 was 
tagged so i wasnt to worried the test failed at that time.. And you 
announced dev2 as such as well, so that was okay. And perhaps the issue 
found then would solve itself when further fixes on top of dev2 were 
added ;).


Anyhow with dev3 i hoped all regressions would be fixed, and found this 
one still failed on 1.9dev3. So it tuned the numbers in the previous 
submitted regtest down a little to avoid conntrack/sysctl default 
limits, while still failing the test 'reliably'.. I'm not sure what 
exactly is going on, or how bad it is that these numbers don't match up 
anymore.. Maybe its only the counter thats not updated in a thread safe 
way, perhaps there is a bigger issue lurking with sync points and 
whatnot..? Either way the test should pass as i understand it, the 4 
defined varnish clients got their answer back and Currconns = 0, also 
adding a 3 second delay between waiting for the clients and checking the 
stats does not fix it... And as youve checked with 1.8 it does pass. 
Though that to could perhaps be a coincidence, maybe now things are 
processed even faster now but in different order so the test fails for 
the wrong reason.?.


Hope that makes some sense in my thought process :).

Regards,

PiBa-NL (Pieter)




Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-29 Thread Willy Tarreau
On Sun, Sep 30, 2018 at 07:46:24AM +0200, Willy Tarreau wrote:
> Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you
> spotted a regression that we have to analyse. However, I'd like to merge
> the fix before merging the regtest otherwise it will kill the reg-test
> feature until we manage to get the issue fixed!

By the way, could you please explain in simple words the issue you've
noticed ? I tried to reverse the vtc file but I don't understand the
details nor what it tries to achieve. When I'm running a simple test
on a simple config, the CummConns always matches the CumReq, and when
running this test I'm seeing random values there in the output, but I
also see that they are retrieved before all connections are closed, so
I'm not even sure the test is correct :-/

Thanks,
Willy



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-29 Thread Willy Tarreau
Hi Pieter,

On Sun, Sep 30, 2018 at 12:05:14AM +0200, PiBa-NL wrote:
> Hi Willy,
> 
> I thought lets give those reg-test another try :) as its easy to run and
> dev3 just came out.
> All tests pass on my FreeBSD system, except this one, new reg-test attached.
> 
> Pretty much the same test as previously send, but now with only 4 x 10
> connections. Which should be fine for conntrack and sysctls (i hope..). It
> seems those stats numbers are 'off', or is my expected value not as fixed as
> i thought it would be?

Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you
spotted a regression that we have to analyse. However, I'd like to merge
the fix before merging the regtest otherwise it will kill the reg-test
feature until we manage to get the issue fixed!

I'm also seeing that you rely on threads, I think I noticed another test
involving threads. Probably that we should have a specific directory for
these ones that we can disable completely when threads are not enabled,
otherwise this will also destroy tests (and make them extremely slow due
to varnishtest waiting for the timeout if haproxy refuses to parse the
config).

I think that we should think a bit forward based on these tests. We must
not let varnishtest stop on the first error but rather just log it. Then
at the end we could produce a report of successes and failures that would
be easy to diff from the previous (or expected) one. That will be
particularly useful when running the tests on older releases. As an
example, I had to run your test manually on 1.8 because for I-don't-know-
what-reason, the one about the proxy protocol now fails while it used to
work fine last week for the 1.8.14 release. That's a shame that we can't
complete tests just because one randomly fails.

Thanks,
Willy



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-29 Thread PiBa-NL

Hi Willy,

I thought lets give those reg-test another try :) as its easy to run and 
dev3 just came out.

All tests pass on my FreeBSD system, except this one, new reg-test attached.

Pretty much the same test as previously send, but now with only 4 x 10 
connections. Which should be fine for conntrack and sysctls (i hope..). 
It seems those stats numbers are 'off', or is my expected value not as 
fixed as i thought it would be?


Tested with:
HA-Proxy version 1.9-dev3-27010f0 2018/09/29
FreeBSD freebsd11 11.1-RELEASE

Results:
 h1    0.0 CLI recv|CumConns: 33
 h1    0.0 CLI recv|CumReq: 65
 h1    0.0 CLI expect failed ~ "CumConns: 41"

If my 'expect' is correct,  would the patch be suitable for inclusion 
with the other reg-tests this way?
If you want to rename loadtest, to heavytest, or any other tweaks please 
feel free to do so.


Regards,
PiBa-NL (Pieter)

Op 20-9-2018 om 22:25 schreef PiBa-NL:

Hi Willy,

Op 20-9-2018 om 13:56 schreef Willy Tarreau:

For me the test produces like 345 lines of output as attached. which seems
not to bad (if the test succeeds).

It's already far too much for a user.


Well those 345 lines are if it succeeds while in 'verbose' mode, in 
'normal' mode it only produces 1 line of output when successful. 
Pretty much all tests produce 100+ lines of 'logging' if they fail for 
some reason. From what ive seen varnishtest either produces a bulk of 
logging on a failure, or it only logs the failures. There isn't much 
in between.


As for all the rest of the email, thanks for your elaborate response :).

Regards,

PiBa-NL (Pieter)



From 28377ffe246ed1db0e0d898fa6263eccdc68c490 Mon Sep 17 00:00:00 2001
From: PiBa-NL 
Date: Sat, 15 Sep 2018 01:51:54 +0200
Subject: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
 after running a 4 x 10 looping requests

---
 reg-tests/loadtest/b0-loadtest.vtc | 99 ++
 1 file changed, 99 insertions(+)
 create mode 100644 reg-tests/loadtest/b0-loadtest.vtc

diff --git a/reg-tests/loadtest/b0-loadtest.vtc 
b/reg-tests/loadtest/b0-loadtest.vtc
new file mode 100644
index ..590924e1
--- /dev/null
+++ b/reg-tests/loadtest/b0-loadtest.vtc
@@ -0,0 +1,99 @@
+# Checks that request and connection counters are properly kept
+
+varnishtest "Connection counters check"
+feature ignore_unknown_macro
+
+server s1 {
+rxreq
+expect req.http.TESTsize == 10
+txresp
+} -repeat 4 -start
+
+syslog Slg_1 -level notice {
+recv
+} -repeat 15 -start
+
+haproxy h1 -W -conf {
+  global
+nbthread 3
+log ${Slg_1_addr}:${Slg_1_port} local0
+#nokqueue
+
+  defaults
+mode http
+option dontlog-normal
+log global
+option httplog
+timeout connect 3s
+timeout client  4s
+timeout server  15s
+
+  frontend fe1
+bind "fd@${fe_1}"
+acl donelooping hdr(TEST) -m len 10
+http-request set-header TEST "%[hdr(TEST)]x"
+use_backend b2 if donelooping
+default_backend b1
+
+  backend b1
+server srv1 ${h1_fe_1_addr}:${h1_fe_1_port}
+
+  frontend fe2
+bind "fd@${fe_2}"
+default_backend b2
+
+  backend b2
+# haproxy 1.8 does not have the ,length converter.
+#acl OK hdr(TEST) -m len 10
+#http-request deny deny_status 200 if OK
+#http-request deny deny_status 400
+
+# haproxy 1.9 does have a ,length converter.
+http-request set-header TESTsize "%[hdr(TEST),length]"
+http-request del-header TEST
+server srv2 ${s1_addr}:${s1_port}
+
+} -start
+
+barrier b1 cond 4
+
+client c1 -connect ${h1_fe_1_sock} {
+  timeout 17
+   barrier b1 sync
+txreq -url "/"
+rxresp
+expect resp.status == 200
+} -start
+client c2 -connect ${h1_fe_1_sock} {
+  timeout 17
+   barrier b1 sync
+txreq -url "/"
+rxresp
+expect resp.status == 200
+} -start
+client c3 -connect ${h1_fe_1_sock} {
+  timeout 17
+   barrier b1 sync
+txreq -url "/"
+rxresp
+expect resp.status == 200
+} -start
+client c4 -connect ${h1_fe_1_sock} {
+  timeout 17
+   barrier b1 sync
+txreq -url "/"
+rxresp
+expect resp.status == 200
+} -start
+
+client c1 -wait
+client c2 -wait
+client c3 -wait
+client c4 -wait
+
+haproxy h1 -cli {
+send "show info"
+expect ~ "CumConns: 41"
+send "show info"
+expect ~ "CumReq: 42"
+}
-- 
2.18.0.windows.1



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-20 Thread PiBa-NL

Hi Willy,

Op 20-9-2018 om 13:56 schreef Willy Tarreau:

For me the test produces like 345 lines of output as attached. which seems
not to bad (if the test succeeds).

It's already far too much for a user.


Well those 345 lines are if it succeeds while in 'verbose' mode, in 
'normal' mode it only produces 1 line of output when successful. Pretty 
much all tests produce 100+ lines of 'logging' if they fail for some 
reason. From what ive seen varnishtest either produces a bulk of logging 
on a failure, or it only logs the failures. There isn't much in between.


As for all the rest of the email, thanks for your elaborate response :).

Regards,

PiBa-NL (Pieter)



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-20 Thread Willy Tarreau
Hi Pieter,

On Thu, Sep 20, 2018 at 12:48:35AM +0200, PiBa-NL wrote:
> Test takes like 5 seconds to run here, and while that is a bit long if you
> get a hundred more similar tests and want to continue tweaking developments
> while running tests in between. It wouldn't hurt to run such a (series) of
> longer tests before creating a patch and submitting it for inclusion on the
> official git repository in my opinion, or before a release.

Definitely, however those run before a release should be almost 100%
system-agnostic. Having to prepare the system and tune it properly for
the test not to fail is going to cause a headache every time there is
a failure because it will mean fixing the root cause and re-run the
whole suite, which precisely is what will make all this suite not to
be used at all anymore. This is why I'm really picky on the reliability
and the speed of these tests. It should be stupid-proof, with me being
the stupid (and believe me when it comes to doing repetitive stuff, I'm
among the stupidest persons you have ever met).

> My attempt was
> to test a bit differently than just looking for regressions of known fixed
> bugs, and putting a little load on haproxy so that threads and simultaneous
> actions 'might' get into conflicts/locks/stuff which might by chance, show
> up, which is why i choose to go a little higher on the number of round-trips
> with ever slightly increasing payload..

I really understand the point and I think it is valid to a certain extent.
But that's really not a thing to run by default. And I want to encourage us
(including me) to run reg tests from time to time. If you know that some of
them will take too long, you'll quickly end up avoiding all the ones you can
easily avoid using a single command (eg: playing with the LEVEL variable, or
not running it at all).

> For me the test produces like 345 lines of output as attached. which seems
> not to bad (if the test succeeds).

It's already far too much for a user. I should only know if it works
or not, otherwise it hides the output of all other ones (which is what
happened). We must not have to parse the output to know if we didn't
break anything, we just have to check that it looks normal. Here's what
make reg-tests gives me on 1.8 :

  willy@wtap:haproxy$ time sh make-reg-tests-1.8 
  #top  TEST reg-tests/lua/b2.vtc passed (0.159)
  #top  TEST reg-tests/lua/b1.vtc passed (0.122)
  #top  TEST reg-tests/lua/b0.vtc passed (0.110)
  #top  TEST reg-tests/lua/b3.vtc passed (0.137)
  #top  TEST reg-tests/connection/b0.vtc passed (0.172)
  #top  TEST reg-tests/server/b0.vtc passed (0.110)
  #top  TEST reg-tests/spoe/b0.vtc passed (0.008)
  #top  TEST reg-tests/ssl/b0.vtc passed (0.139)
  #top  TEST reg-tests/stick-table/b1.vtc passed (0.110)
  #top  TEST reg-tests/stick-table/b0.vtc passed (0.110)
  #top  TEST reg-tests/log/b0.vtc passed (0.125)
  #top  TEST reg-tests/seamless-reload/b0.vtc passed (0.123)
  
  real0m1.713s
  user0m0.316s
  sys 0m0.068s

As you can see there's no output to parse, it *visually* looks correct.

> Besides the 2 instances of cli output
> for stats, its seems not that much different from other tests..
> And with 1.8.13 on FreeBSD (without qkueue) it succeeds:  #    top TEST
> ./test/b0-loadtest.vtc passed (4.800

OK then you get a valid output there. It's here that it's ugly. But we
spend enough time analysing bugs, I really refuse to spend extra time
fixing bugs in tests supposed to help detect bugs, otherwise it becomes
recursive...

> Taking into account conntrack and ulimit, would that mean we can never
> 'reg-test' if haproxy can really handle like 1 connections without
> issue?

10k conns definitely is way beyond what you can expect from a non-root
user on a regular shell. I run most of my configs with "maxconn 400"
because that's less than 1024 FDs once you add the extra FDs for
listeners and checks. Anything beyond that will depend on the users'
setup and becomes tricky. And in this case it's more a matter of
stress-testing the system, and we can have stress-test procedures or
tools (just like we all do on our respective setups with different
tools). It's just that one has to know in advance that some preparation
is needed (typically killing a possible browser, unloading some modules,
checking that there's still memory left, maybe adding some addresses to
the loopback, etc). So it's a different approach and it definitely is
out of the scope of automatizing detection of potential regressions
during development.

> Or should the environment be configured by the test?? ,that seems
> very tricky at least and probably would be undesirable..

No definitely it would be even worse. For sure those used to run
"curl | sudo bash" will have no problem letting it configure their
system, but I'm not among such people and I appreciate a lot that
my machine works every morning when I want to 

Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-19 Thread PiBa-NL

Hi Willy,

Op 19-9-2018 om 7:36 schreef Willy Tarreau:

Hi Pieter,

I took some time this morning to give it a test. For now it fails here,
after dumping 2200 lines of not really usable output that I didn't
investigate. From what I'm seeing it seems to moderately stress the
local machine so it has many reasons for failing (lack of source
ports, improperly tuned conntrack, ulimit, etc), and it takes far too
long a time to be usable as a default test, or this one alone will be
enough to discourage anyone from regularly running "make reg-tests".
Test takes like 5 seconds to run here, and while that is a bit long if 
you get a hundred more similar tests and want to continue tweaking 
developments while running tests in between. It wouldn't hurt to run 
such a (series) of longer tests before creating a patch and submitting 
it for inclusion on the official git repository in my opinion, or before 
a release.?. My attempt was to test a bit differently than just looking 
for regressions of known fixed bugs, and putting a little load on 
haproxy so that threads and simultaneous actions 'might' get into 
conflicts/locks/stuff which might by chance, show up, which is why i 
choose to go a little higher on the number of round-trips with ever 
slightly increasing payload..


For me the test produces like 345 lines of output as attached. which 
seems not to bad (if the test succeeds).. Besides the 2 instances of cli 
output for stats, its seems not that much different from other tests..
And with 1.8.13 on FreeBSD (without qkueue) it succeeds:  #    top TEST 
./test/b0-loadtest.vtc passed (4.800


Taking into account conntrack and ulimit, would that mean we can never 
'reg-test' if haproxy can really handle like 1 connections without 
issue? Or should the environment be configured by the test?? ,that seems 
very tricky at least and probably would be undesirable.. (i just today 
figured i could run reg-tests also on my production box to compare if a 
new build showed issues there that my test-box might not.. i wouldn't 
want system settings to changed by a reg-test run..)



I think we should create a distinct category for such tests
Agreed, which is why i used the currently non-existing '/loadtest/' 
folder. If '/heavy/' is better thats of course alright for me to.

, because
I see some value in it when it's made to reproduce a very specific
class of issues which is very unlikely to trigger unless someone is
working on it. In this case it is not a problem that it dumps a lot
of output, as it will be useful for the person knowing what to look
for there. Probably that such tests should be run by hand and dump
their log into a related file. Imagine for example that we would
have this :

  $ make reg-tests/heavy/conn-counter-3000-req.log
I'm not exactly sure..("make: don't know how to make reg-tests. Stop"). 
i would still like to have a way to run all 'applicable' tests with 1 
command, even if it takes a hour or so to verify haproxy is working 
'perfectly'. But like abns@ tests cant work on FreeBSD, but they should 
not 'fail', perhaps get skipped automatically though.?. Anyhow thats a 
question for my other mail-topic ( 
https://www.mail-archive.com/haproxy@formilux.org/msg31195.html )

It would run the test on reg-tests/heavy/conn-counter-3000-req.vtc and
would produce the log into reg-tests/heavy/conn-counter-3000-req.log.
We could use a similar thing to test for select/poll/epoll/kqueue, to
test for timeouts, race conditions (eg show sess in threads). This is
very likely something to brainstorm about. You might have other ideas
related to certain issues you faced in the past. Fred is unavailable
this week but I'd be very interested in his opinion on such things.

Thus for now I'm not applying your patch, but I'm interested in seeing
what can be done with it.
Okay no problem :) , ill keep running this particular test myself for 
the moment, it 'should' be able to pass normally..  (On my environment 
anyhow..)

Thanks,
Willy


Thanks for your comments, and thoughts.

I'm interested in Fred's and anyone elses opinion ;) , and well maybe 
this particular test-case could be replaced by something simpler/faster/ 
with more or less the same likelihood of catching yet unknown issues..? 
Looking forward to reactions :) .


Thanks and regards,

PiBa-NL (Pieter)

 top   0.0 extmacro def pwd=/usr/ports/net/haproxy-devel
 top   0.0 extmacro def localhost=127.0.0.1
 top   0.0 extmacro def bad_backend=127.0.0.1 58530
 top   0.0 extmacro def bad_ip=192.0.2.255
 top   0.0 macro def testdir=/usr/ports/net/haproxy-devel/./test
 top   0.0 macro def tmpdir=/tmp/vtc.35996.290f74a9
*top   0.0 TEST ./test/b0-loadtest.vtc starting
**   top   0.0 === varnishtest "Seamless reload issue with abns sockets"
*top   0.0 TEST Seamless reload issue with abns sockets
**   top   0.0 === feature ignore_unknown_macro
**   top   0.0 === server s1 {
**   s10.0 Starting server
 s10.0 macro def 

Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-18 Thread Willy Tarreau
Hi Pieter,

On Sat, Sep 15, 2018 at 02:03:45AM +0200, PiBa-NL wrote:
> Hi List, Willy,
> 
> I've created a regtest that checks that when concurrent connections are
> being handled that the connection counters are kept properly.
> 
> I think it could be committed as attached. It takes a few seconds to run. It
> currently fails on 1.9-dev2 (also fails on 1.8.13 with kqueue on FreeBSD,
> adding a 'nokqueue' on 1.8.13 makes it succeed though..).
> 
> I think it might be a good and reproducible test to run.?
> 
> Or does it need more tweaking? Thoughts appreciated :).

I took some time this morning to give it a test. For now it fails here,
after dumping 2200 lines of not really usable output that I didn't
investigate. From what I'm seeing it seems to moderately stress the
local machine so it has many reasons for failing (lack of source
ports, improperly tuned conntrack, ulimit, etc), and it takes far too
long a time to be usable as a default test, or this one alone will be
enough to discourage anyone from regularly running "make reg-tests".

I think we should create a distinct category for such tests, because
I see some value in it when it's made to reproduce a very specific
class of issues which is very unlikely to trigger unless someone is
working on it. In this case it is not a problem that it dumps a lot
of output, as it will be useful for the person knowing what to look
for there. Probably that such tests should be run by hand and dump
their log into a related file. Imagine for example that we would
have this :

 $ make reg-tests/heavy/conn-counter-3000-req.log

It would run the test on reg-tests/heavy/conn-counter-3000-req.vtc and
would produce the log into reg-tests/heavy/conn-counter-3000-req.log.
We could use a similar thing to test for select/poll/epoll/kqueue, to
test for timeouts, race conditions (eg show sess in threads). This is
very likely something to brainstorm about. You might have other ideas
related to certain issues you faced in the past. Fred is unavailable
this week but I'd be very interested in his opinion on such things.

Thus for now I'm not applying your patch, but I'm interested in seeing
what can be done with it.

Thanks,
Willy