Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
On Fri, Oct 12, 2018 at 02:29:51PM +0200, PiBa-NL wrote: > Op 12-10-2018 om 10:53 schreef William Lallemand: > > The attached patch should fix the issue. > > The patch works for me, thanks. Great, patch now merged. Thanks! Willy
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi William, Op 12-10-2018 om 10:53 schreef William Lallemand: The attached patch should fix the issue. The patch works for me, thanks. Regards, PiBa-NL (Pieter)
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
On Thu, Oct 11, 2018 at 11:06:38PM +0200, William Lallemand wrote: > On Thu, Oct 11, 2018 at 09:31:48PM +0200, PiBa-NL wrote: > > Hi Willy, William, > > > > Hi Peter, > > Regarding this part: > > > -Connection and request counters to low when ran as regtest from > > varnishtest (bug?) > > It turns out that starting haproxy from varnishtest, and using -W > > master-worker mode, actually creates 2 processes that are handling > > traffic. That explains that a large part of connections isn't seen by > > the other haproxy instance and stats showing to low amounts of > > connections. Bisecting it seems to fail on this commit: b3f2be3 , > > perhaps @William can you take a look at it? Not really sure when this > > occurs in a 'real' environment, it doesn't seem to happen when manually > > running haproxy -W, but still its strange that when varnisttest is > > calling haproxy this occurs. > > > > There was an exception in the master's code regarding the inherited FDs (fd@), > which is the case with varnishtest. There is a flag which was forbidding the > master to close the FD in 1.8. > > Now there is a polling loop in the master and I think there is a side effect > where the FD is registered in the master polling loop. > > It's just a guess, I'll look at it tomorrow. > > Cheers, > The attached patch should fix the issue. -- William Lallemand >From 3f2c30a0f15e30b7941245d3c5b20cbd5561489f Mon Sep 17 00:00:00 2001 From: William Lallemand Date: Fri, 12 Oct 2018 10:39:54 +0200 Subject: [PATCH] BUG/MEDIUM: mworker: don't poll on LI_O_INHERITED listeners The listeners with the LI_O_INHERITED flag were deleted but not unbound which is a problem since we have a polling in the master. This patch unbind every listeners which are not require for the master, but does not close the FD of those that have a LI_O_INHERITED flag. --- src/haproxy.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/src/haproxy.c b/src/haproxy.c index a7b07a267..82da86222 100644 --- a/src/haproxy.c +++ b/src/haproxy.c @@ -615,13 +615,17 @@ static void mworker_cleanlisteners() for (curproxy = proxies_list; curproxy; curproxy = curproxy->next) { list_for_each_entry_safe(l, l_next, >conf.listeners, by_fe) { - /* does not close if the FD is inherited with fd@ - * from the parent process */ - if (!(l->options & (LI_O_INHERITED|LI_O_MWORKER))) -unbind_listener(l); /* remove the listener, but not those we need in the master... */ - if (!(l->options & LI_O_MWORKER)) + if (!(l->options & LI_O_MWORKER)) { +/* unbind the listener but does not close if + the FD is inherited with fd@ from the parent + process */ +if (l->options & LI_O_INHERITED) + unbind_listener_no_close(l); +else + unbind_listener(l); delete_listener(l); + } } } } -- 2.16.4
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
On Thu, Oct 11, 2018 at 09:31:48PM +0200, PiBa-NL wrote: > Hi Willy, William, > Hi Peter, Regarding this part: > -Connection and request counters to low when ran as regtest from > varnishtest (bug?) > It turns out that starting haproxy from varnishtest, and using -W > master-worker mode, actually creates 2 processes that are handling > traffic. That explains that a large part of connections isn't seen by > the other haproxy instance and stats showing to low amounts of > connections. Bisecting it seems to fail on this commit: b3f2be3 , > perhaps @William can you take a look at it? Not really sure when this > occurs in a 'real' environment, it doesn't seem to happen when manually > running haproxy -W, but still its strange that when varnisttest is > calling haproxy this occurs. > There was an exception in the master's code regarding the inherited FDs (fd@), which is the case with varnishtest. There is a flag which was forbidding the master to close the FD in 1.8. Now there is a polling loop in the master and I think there is a side effect where the FD is registered in the master polling loop. It's just a guess, I'll look at it tomorrow. Cheers, -- William Lallemand
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Willy, William, Op 2-10-2018 om 3:56 schreef Willy Tarreau: it's important to cut this into pieces to figure what it's showing. A little update on this issue split in 2 pieces. -Connection and request counters to low when ran as regtest from varnishtest (bug?) It turns out that starting haproxy from varnishtest, and using -W master-worker mode, actually creates 2 processes that are handling traffic. That explains that a large part of connections isn't seen by the other haproxy instance and stats showing to low amounts of connections. Bisecting it seems to fail on this commit: b3f2be3 , perhaps @William can you take a look at it? Not really sure when this occurs in a 'real' environment, it doesn't seem to happen when manually running haproxy -W, but still its strange that when varnisttest is calling haproxy this occurs. -Request counter to high (possibly a improvement request?) The http_end_txn_clean_session(..) is called which increments the counter on a finished http response, and i was testing with 2 different methods for 1.8 and 1.9 due to missing length converter i used in my 1.9 test, which makes the comparison unfair. Sorry i didn't realize this earlier i thought it did 'more or less' the same, that seems to have been 'less'. Together with that i found the numbers odd/unexpected i assumed a bigger problem that it actually seems to be, perhaps its not even that bad, haproxy is 'preparing' for a second request over the keep-alive connection if i understand correctly. Which eventually doesn't happen, but is counted. Maybe that is a point that can be improved in a future version if time permits.? Or would it even be expected to behave like that? Regards, PiBa-NL (Pieter)
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Pieter, On Thu, Oct 04, 2018 at 09:43:36PM +0200, PiBa-NL wrote: > Hi Willy, > > Op 2-10-2018 om 3:56 schreef Willy Tarreau: > > it's important to cut this into pieces > > to figure what it's showing. > > Okay, the good thing i suppose we already have a reproduction sort-off.. > > What would be the best way to try and get to the bottom of this? Add debug > output to haproxy code? Get some kind of trace? Anything in particular that > would be of interest? If you manage to figure a way to reproduce it with the "connection: close" header, that's one different bug from what I observed. However to be honest, given how much work we still have for the rework of the HTTP engine, for now I'd rather continue to work on the stuff we still have to do than spend one week figuring why this counter is sometimes wrong, so if you or anyone else wants to dig this one down to the root cause, I would really appreciate it. Thanks! Willy
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Willy, Op 2-10-2018 om 3:56 schreef Willy Tarreau: it's important to cut this into pieces to figure what it's showing. Okay, the good thing i suppose we already have a reproduction sort-off.. What would be the best way to try and get to the bottom of this? Add debug output to haproxy code? Get some kind of trace? Anything in particular that would be of interest? Regards, PiBa-NL (Pieter)
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Pieter, On Mon, Oct 01, 2018 at 11:06:33PM +0200, PiBa-NL wrote: > I have used the exact config above, and well it 'works fine' ?? Results > below.. both types of nc requests send to 1 running haproxy result in a > proper '21'.. OK. > But even in this case the results would be 'predicable', while with my > original test the outcome is 'randomized'.. > So despite the fact that i don't know whats really wrong, and the > environment it runs on does seem to affect things. There is more going on > that simply a counter incremented twice.. Also your results are 'higher' > than expected, while mine are 'lower' than expected.. Perhaps this single > testcase is already showing multiple issues beneath the carpet ;) ?. It's very likely :-/ That's why it's important to cut this into pieces to figure what it's showing. Willy
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Frederic, Op 1-10-2018 om 15:15 schreef Frederic Lecaille: On 09/15/2018 02:03 AM, PiBa-NL wrote: Hi List, Willy, Hi Pieter, Sorry for this late reply. I am also sorry to tell you that -D option should not be used anymore because it as been recently broken. It adds an extra 2s delay on my PC. There is a 'newer' version of the regtest in one of the other mails ( https://www.mail-archive.com/haproxy@formilux.org/msg31318.html ). it only runs 4x10 connections. and doesn't use -D anymore and well discussing with Willy some odd behavioral differences between his and my test results.. Without this option, on my PC the test fails but after having being run during about 300ms due to the regex which does not match. CummConns (resp. CumReq) never reaches 3001 (resp. 3002). Note that as this script is highly verbose it may be required to increase the varnishtest internal buffer size (see -b varnishtest option). Thanks didn't realize that one could have helped, probably not needed anymore with the changed regtest though. I've created a regtest that checks that when concurrent connections are being handled that the connection counters are kept properly. I think it could be committed as attached. It takes a few seconds to run. It currently fails on 1.9-dev2 (also fails on 1.8.13 with kqueue on FreeBSD, adding a 'nokqueue' on 1.8.13 makes it succeed though..). I think it might be a good and reproducible test to run.? Or does it need more tweaking? Thoughts appreciated :). I have shorten the "haproxy -cli" section like that: haproxy h1 -cli { send "show info" expect ~ "CumConns: 3001*\\nCumReq: 3002*" } -wait Thats a nice one, i did try a .* regex like wildcard check but found that didn't work, this will certainly help in case the test fails to produce less repeated/unneeded output and when it doesn't fail it will be a bit faster i guess, and allow for more checks on the same set of stats to take place with little effort. Note that all the Slg_1 syslog specs are not completely run because you do not wait for its termination with "syslog Slg_1 -wait" command this is why it does not fails (but it should: the syslog server does not receive 15 messages). I didn't intent to check for 15 messages as a valid outcome, but was looking more for 'some' syslog output when the testcase fails for some reason.. With the now fairly limited 40 connections it might be feasible to more properly check syslog output though. With 3000 connections it wasn't really nice when syslog was writing everything to the screen. Ill take another look at that one. Regards, PiBa-NL (Pieter)
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Willy, Op 1-10-2018 om 4:00 schreef Willy Tarreau: On Sun, Sep 30, 2018 at 10:16:55PM +0200, PiBa-NL wrote: Hi Willy, Op 30-9-2018 om 20:38 schreef Willy Tarreau: On Sun, Sep 30, 2018 at 08:22:23PM +0200, Willy Tarreau wrote: On Sun, Sep 30, 2018 at 07:59:34PM +0200, PiBa-NL wrote: Indeed it works with 1.8, so in that regard i 'think' the test itself is correct.. Also when disabling threads, or running only 1 client, it still works.. Then both CumConns CumReq show 11 for the first stats result. Hmmm for me it fails even without threads. That was the first thing I tried when meeting the error in fact. But I need to dig deeper. So I'm seeing that in fact the count is correct if the server connection closes first, and wrong otherwise. In fact it fails similarly both for 1.6, 1.7, 1.8 and 1.9 with and without threads. I'm seeing that the connection count is exactly 10 times the incoming connections while the request count is exactly 20 times this count. I suspect that what happens is that the request count is increased on each connection when preparing to receive a new request. This even slightly reminds me something but I don't know where I noticed something like this, I think I saw this when reviewing the changes needed to be made to HTTP for the native internal representation. So I think it's a minor bug, but not a regression. Thanks, Willy Not sure, only difference between 100x FAILED and 100x OK is the version here. Command executed and result below. Perhaps that's just because of the OS / Scheduler used though, i assume your using some linux distro to test with, perhaps that explains part of the differences between your and my results.. I find this strange. Your config contains a comment about the length converter missing from 1.8 so I had to change it to use the deny part on 1.8. It happens that using deny here precisely is what fixed the problem for me the first time. I simplified it this way to run a manual test : global stats socket /tmp/sock1 mode 666 level admin #nbthread 3 log 127.0.0.1:5514 local0 #nokqueue defaults mode http option dontlog-normal log global option httplog timeout connect 3s timeout client 4s timeout server 15s frontend fe1 bind 127.0.0.1:8001 acl donelooping hdr(TEST) -m len 10 http-request set-header TEST "%[hdr(TEST)]x" use_backend b2 if donelooping default_backend b1 backend b1 server srv1 127.0.0.1:8001 frontend fe2 bind 127.0.0.1:8002 default_backend b2 backend b2 server srv2 127.0.0.1:8000 Then I test it this way and got the same results on all versions : $ echo -e "GET / HTTP/1.1\r\n\r\n"|nc 0 8001 $ echo show info | socat - /tmp/sock1 | grep Cum CumConns: 11 CumReq: 21 CumSslConns: 0 $ echo -e "GET / HTTP/1.1\r\nconnection: close\r\n\r\n"|nc 0 8001 $ echo show info | socat - /tmp/sock1 | grep Cum CumConns: 11 CumReq: 11 CumSslConns: 0 Regards, Willy I have used the exact config above, and well it 'works fine' ?? Results below.. both types of nc requests send to 1 running haproxy result in a proper '21'.. But even in this case the results would be 'predicable', while with my original test the outcome is 'randomized'.. So despite the fact that i don't know whats really wrong, and the environment it runs on does seem to affect things. There is more going on that simply a counter incremented twice.. Also your results are 'higher' than expected, while mine are 'lower' than expected.. Perhaps this single testcase is already showing multiple issues beneath the carpet ;) ?. Regards, PiBa-NL (Pieter) # haproxy -v HA-Proxy version 1.8.14-52e4d43 2018/09/20 Copyright 2000-2018 Willy Tarreau # echo -e "GET / HTTP/1.1\r\n\r\n"|nc 0 8001 HTTP/1.0 503 Service Unavailable Cache-Control: no-cache Content-Type: text/html 503 Service Unavailable No server is available to handle this request. # echo -e "GET / HTTP/1.1\r\nconnection: close\r\n\r\n"|nc 0 8001 HTTP/1.0 503 Service Unavailable Cache-Control: no-cache Content-Type: text/html 503 Service Unavailable No server is available to handle this request. # echo show info | socat - /tmp/sock1 | grep Cum CumConns: 21 CumReq: 21 CumSslConns: 0 # # # haproxy -v HA-Proxy version 1.9-dev3-27010f0 2018/09/29 Copyright 2000-2018 Willy Tarreau # echo -e "GET / HTTP/1.1\r\n\r\n"|nc 0 8001 HTTP/1.0 503 Service Unavailable Cache-Control: no-cache Content-Type: text/html 503 Service Unavailable No server is available to handle this request. # echo -e "GET / HTTP/1.1\r\nconnection: close\r\n\r\n"|nc 0 8001 HTTP/1.0 503 Service Unavailable Cache-Control: no-cache Content-Type: text/html 503 Service Unavailable No server is available to handle this request. # echo show info | socat - /tmp/sock1 | grep Cum CumConns: 21 CumReq: 21 CumSslConns: 0 #
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
On 09/15/2018 02:03 AM, PiBa-NL wrote: Hi List, Willy, Hi Pieter, Sorry for this late reply. I am also sorry to tell you that -D option should not be used anymore because it as been recently broken. It adds an extra 2s delay on my PC. Without this option, on my PC the test fails but after having being run during about 300ms due to the regex which does not match. CummConns (resp. CumReq) never reaches 3001 (resp. 3002). Note that as this script is highly verbose it may be required to increase the varnishtest internal buffer size (see -b varnishtest option). I've created a regtest that checks that when concurrent connections are being handled that the connection counters are kept properly. I think it could be committed as attached. It takes a few seconds to run. It currently fails on 1.9-dev2 (also fails on 1.8.13 with kqueue on FreeBSD, adding a 'nokqueue' on 1.8.13 makes it succeed though..). I think it might be a good and reproducible test to run.? Or does it need more tweaking? Thoughts appreciated :). I have shorten the "haproxy -cli" section like that: haproxy h1 -cli { send "show info" expect ~ "CumConns: 3001*\\nCumReq: 3002*" } -wait Note that all the Slg_1 syslog specs are not completely run because you do not wait for its termination with "syslog Slg_1 -wait" command this is why it does not fails (but it should: the syslog server does not receive 15 messages). Regards, PiBa-NL (Pieter)
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Willy, Op 30-9-2018 om 20:38 schreef Willy Tarreau: On Sun, Sep 30, 2018 at 08:22:23PM +0200, Willy Tarreau wrote: On Sun, Sep 30, 2018 at 07:59:34PM +0200, PiBa-NL wrote: Indeed it works with 1.8, so in that regard i 'think' the test itself is correct.. Also when disabling threads, or running only 1 client, it still works.. Then both CumConns CumReq show 11 for the first stats result. Hmmm for me it fails even without threads. That was the first thing I tried when meeting the error in fact. But I need to dig deeper. So I'm seeing that in fact the count is correct if the server connection closes first, and wrong otherwise. In fact it fails similarly both for 1.6, 1.7, 1.8 and 1.9 with and without threads. I'm seeing that the connection count is exactly 10 times the incoming connections while the request count is exactly 20 times this count. I suspect that what happens is that the request count is increased on each connection when preparing to receive a new request. This even slightly reminds me something but I don't know where I noticed something like this, I think I saw this when reviewing the changes needed to be made to HTTP for the native internal representation. So I think it's a minor bug, but not a regression. Thanks, Willy Not sure, only difference between 100x FAILED and 100x OK is the version here. Command executed and result below. Perhaps that's just because of the OS / Scheduler used though, i assume your using some linux distro to test with, perhaps that explains part of the differences between your and my results.. In the end it doesn't matter much if its a bug or a regression still needs a fix ;). And well i don't know if its just the counter thats wrong, or there might be bigger consequences somewhere. if its just the counter then i guess it wouldn't hurt much to postpone a fix to a next (dev?) version. Regards, PiBa-NL (Pieter) root@freebsd11:/usr/ports/net/haproxy-devel # varnishtest -q -n 100 -j 16 -k ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc ... # top TEST ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc FAILED (0.128) exit=2 # top TEST ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc FAILED (0.135) exit=2 100 tests failed, 0 tests skipped, 0 tests passed root@freebsd11:/usr/ports/net/haproxy-devel # haproxy -v HA-Proxy version 1.9-dev3-27010f0 2018/09/29 Copyright 2000-2018 Willy Tarreau root@freebsd11:/usr/ports/net/haproxy-devel # pkg add -f haproxy-1.8.14-selfbuild-reg-tests-OK.txz Installing haproxy-1.8... package haproxy is already installed, forced install Extracting haproxy-1.8: 100% root@freebsd11:/usr/ports/net/haproxy-devel # varnishtest -q -n 100 -j 16 -k ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc 0 tests failed, 0 tests skipped, 100 tests passed root@freebsd11:/usr/ports/net/haproxy-devel # haproxy -v HA-Proxy version 1.8.14-52e4d43 2018/09/20 Copyright 2000-2018 Willy Tarreau
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
On Sun, Sep 30, 2018 at 08:22:23PM +0200, Willy Tarreau wrote: > On Sun, Sep 30, 2018 at 07:59:34PM +0200, PiBa-NL wrote: > > Indeed it works with 1.8, so in that regard i 'think' the test itself is > > correct.. Also when disabling threads, or running only 1 client, it still > > works.. Then both CumConns CumReq show 11 for the first stats result. > > Hmmm for me it fails even without threads. That was the first thing I > tried when meeting the error in fact. But I need to dig deeper. So I'm seeing that in fact the count is correct if the server connection closes first, and wrong otherwise. In fact it fails similarly both for 1.6, 1.7, 1.8 and 1.9 with and without threads. I'm seeing that the connection count is exactly 10 times the incoming connections while the request count is exactly 20 times this count. I suspect that what happens is that the request count is increased on each connection when preparing to receive a new request. This even slightly reminds me something but I don't know where I noticed something like this, I think I saw this when reviewing the changes needed to be made to HTTP for the native internal representation. So I think it's a minor bug, but not a regression. Thanks, Willy
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
On Sun, Sep 30, 2018 at 07:59:34PM +0200, PiBa-NL wrote: > Indeed it works with 1.8, so in that regard i 'think' the test itself is > correct.. Also when disabling threads, or running only 1 client, it still > works.. Then both CumConns CumReq show 11 for the first stats result. Hmmm for me it fails even without threads. That was the first thing I tried when meeting the error in fact. But I need to dig deeper. > > However, I'd like to merge > > the fix before merging the regtest otherwise it will kill the reg-test > > feature until we manage to get the issue fixed! > I'm not fully sure i agree on that.. While i understand that failing > reg-tests can be a pita while developing (if you run them regulary) the fact > is that currently existing tests can already already start to fail after > some major redesign of the code, a few mails back (different mailthread) i > tested like 10 commits in a row and they all suffered from different failing > tests, that would imho not be a reason to remove those tests, and they didnt > stop development. The reason is that for now we have no way to let the tests fail gracefully and report what is OK and what is not. So any error that's in the way will lead to an absolutely certain behaviour from everyone : nobody will run the tests anymore since the result will be known. Don't get me wrong, I'm willing to get as many tests as we can, but 1) we have to be sure these tests only fail for regressions and not for other reasons, and 2) we must be sure that these tests do not prevent other ones from being run nor make it impossible to observe the progress on other ones. We're still at the beginning with reg tests, and as you can see we have not even yet sorted out the requirements for some of them like threads or Lua or whatever else. I'm just asking that we don't create tests faster than we can sort them out, that's all. This probably means that we really have to work on these two main areas which are test prerequisites and synthetic reports of what worked and what failed. Ideas and proposals on this are welcome, but to be honest I can't spend as much time as I'd want on this for now given how late we are on all what remains to be done, so I really welcome discussions and help on the subject between the various actors. Thanks, Willy
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
On Sun, Sep 30, 2018 at 07:15:59PM +0200, PiBa-NL wrote: > > on a simple config, the CummConns always matches the CumReq, and when > > running this test I'm seeing random values there in the output, but I > > also see that they are retrieved before all connections are closed > But CurrConns is 0, so connections are (supposed to be?) closed? : > > h1 0.0 CLI recv|CurrConns: 0 > h1 0.0 CLI recv|CumConns: 27 > h1 0.0 CLI recv|CumReq: 27 You're totally right, I think I confused CumConns and CurrConns when looking at the output. With that said I have no idea what's going on, I'll have another look. Thanks, Willy
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Willy, Op 30-9-2018 om 7:46 schreef Willy Tarreau: Hi Pieter, On Sun, Sep 30, 2018 at 12:05:14AM +0200, PiBa-NL wrote: Hi Willy, I thought lets give those reg-test another try :) as its easy to run and dev3 just came out. All tests pass on my FreeBSD system, except this one, new reg-test attached. Pretty much the same test as previously send, but now with only 4 x 10 connections. Which should be fine for conntrack and sysctls (i hope..). It seems those stats numbers are 'off', or is my expected value not as fixed as i thought it would be? Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you spotted a regression that we have to analyse. Indeed it works with 1.8, so in that regard i 'think' the test itself is correct.. Also when disabling threads, or running only 1 client, it still works.. Then both CumConns CumReq show 11 for the first stats result. However, I'd like to merge the fix before merging the regtest otherwise it will kill the reg-test feature until we manage to get the issue fixed! I'm not fully sure i agree on that.. While i understand that failing reg-tests can be a pita while developing (if you run them regulary) the fact is that currently existing tests can already already start to fail after some major redesign of the code, a few mails back (different mailthread) i tested like 10 commits in a row and they all suffered from different failing tests, that would imho not be a reason to remove those tests, and they didnt stop development. I'm also seeing that you rely on threads, I think I noticed another test involving threads. Probably that we should have a specific directory for these ones that we can disable completely when threads are not enabled, otherwise this will also destroy tests (and make them extremely slow due to varnishtest waiting for the timeout if haproxy refuses to parse the config). A specific directory will imho not work. How should it be called? /threaded_lua_with_ssl_using_kqueue_scheduler_on_freebsd_without_absn_for_haproxy_1.9_and_higher/ ? Having varnishtest fail while waiting for a feature that was not compiled is indeed undesirable as well. So some 'smart' way of defining 'requirements' for a test will be needed so they can gracefully skip if not applicable.. I'm not sure myself how that way should look though.. On one side i think the .vtc itself might be the place to define what requirements it has, on the other the other a separate list/script including logic of what tests to run could be nice.. But then who is going to maintain that one.. I think that we should think a bit forward based on these tests. We must not let varnishtest stop on the first error but rather just log it. varnishtest can continue on error with -k Using this little mytest.sh script at the moment, this runs all tests and only failed tests produce a lot of logging..: haproxy -v varnishtest -j 16 -k -t 20 ./work/haproxy-*/reg-tests/*/*.vtc > ./mytest-result.log 2>&1 varnishtest -j 16 -k -t 20 ./haproxy_test_OK_20180831/*/*.vtc >> ./mytest-result.log 2>&1 cat ./mytest-result.log echo "" >> ./mytest-result.log haproxy -vv >> ./mytest-result.log There is also the -q parameter, but then it doesn't log anymore what tests passed and would only the failed tests will produce 1 log line.. (i do like to log what tests where executed though..) Then at the end we could produce a report of successes and failures that would be easy to diff from the previous (or expected) one. That will be particularly useful when running the tests on older releases. As an example, I had to run your test manually on 1.8 because for I-don't-know- what-reason, the one about the proxy protocol now fails while it used to work fine last week for the 1.8.14 release. That's a shame that we can't complete tests just because one randomly fails. You can continue tests. ( -k ) But better write it out to a logfile then, or perhaps combine with -l which leaves the /tmp/.vtc folder.. Thanks, Willy Regards, PiBa-NL (Pieter)
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Willy, Op 30-9-2018 om 7:56 schreef Willy Tarreau: On Sun, Sep 30, 2018 at 07:46:24AM +0200, Willy Tarreau wrote: Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you spotted a regression that we have to analyse. However, I'd like to merge the fix before merging the regtest otherwise it will kill the reg-test feature until we manage to get the issue fixed! By the way, could you please explain in simple words the issue you've noticed ? I tried to reverse the vtc file but I don't understand the details nor what it tries to achieve. When I'm running a simple test on a simple config, the CummConns always matches the CumReq, and when running this test I'm seeing random values there in the output, but I also see that they are retrieved before all connections are closed But CurrConns is 0, so connections are (supposed to be?) closed? : h1 0.0 CLI recv|CurrConns: 0 h1 0.0 CLI recv|CumConns: 27 h1 0.0 CLI recv|CumReq: 27 , so I'm not even sure the test is correct :-/ Thanks, Willy What i'm trying to achieve is, well.. testing for regressions that are not yet known to exist on the current stable version. So what this test does in short: It makes 4 clients simultaneously send a request to a threaded haproxy, which in turn connects 10x backend to frontend and then sends the request to the s1 server. This with the intended purpose of having several connections started and broken up as fast as haproxy can process them while trying to have a high probability of adding/removing items from lists/counters from different threads thus possibly creating problems if some lock/sync isn't done correctly. After firing a few requests it also verifies the expected counts, and results where possible.. History: Ive been bit a few times with older releases by corruption occurring inside the POST data when uploading large (500MB+) files to a server behind haproxy. After a few megabytes are passed correctly the resulting file would contain differences from their original when compared, the upload 'seemed' to succeed though. (this was then solved by installing a newer haproxy build..).. Also sometimes threads have locked up or crashed things. Or kqueue scheduler turned out to behave differently than others.. Ive been trying to test such things manually but found i always forget to run some test. This is why i really like the concept of having a set of defined tests that validate haproxy is working 'properly', on the OS i run it on.. Also when some issue i ran into gets fixed i tend to run -dev builds on my production environment for a while, and well its nice to know that other functionality still works as it used to.. With writing this test i initially started with the idea of automatically testing a large file transfer through haproxy, but then thought where / how to leave such a file, so i thought of transferring a 'large' header with increasing size 'might' trigger a similar condition.. Though in hindsight that might not actually test the same code paths.. I created that test with 1 byte growth in the header together with 4000 connections didn't quite achieve that initial big file simulation, but still i thought it ended up to be a nice test. So submitted it a while back ;) .. Anyhow haproxy wasn't capable of doing much when dev2 was tagged so i wasnt to worried the test failed at that time.. And you announced dev2 as such as well, so that was okay. And perhaps the issue found then would solve itself when further fixes on top of dev2 were added ;). Anyhow with dev3 i hoped all regressions would be fixed, and found this one still failed on 1.9dev3. So it tuned the numbers in the previous submitted regtest down a little to avoid conntrack/sysctl default limits, while still failing the test 'reliably'.. I'm not sure what exactly is going on, or how bad it is that these numbers don't match up anymore.. Maybe its only the counter thats not updated in a thread safe way, perhaps there is a bigger issue lurking with sync points and whatnot..? Either way the test should pass as i understand it, the 4 defined varnish clients got their answer back and Currconns = 0, also adding a 3 second delay between waiting for the clients and checking the stats does not fix it... And as youve checked with 1.8 it does pass. Though that to could perhaps be a coincidence, maybe now things are processed even faster now but in different order so the test fails for the wrong reason.?. Hope that makes some sense in my thought process :). Regards, PiBa-NL (Pieter)
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
On Sun, Sep 30, 2018 at 07:46:24AM +0200, Willy Tarreau wrote: > Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you > spotted a regression that we have to analyse. However, I'd like to merge > the fix before merging the regtest otherwise it will kill the reg-test > feature until we manage to get the issue fixed! By the way, could you please explain in simple words the issue you've noticed ? I tried to reverse the vtc file but I don't understand the details nor what it tries to achieve. When I'm running a simple test on a simple config, the CummConns always matches the CumReq, and when running this test I'm seeing random values there in the output, but I also see that they are retrieved before all connections are closed, so I'm not even sure the test is correct :-/ Thanks, Willy
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Pieter, On Sun, Sep 30, 2018 at 12:05:14AM +0200, PiBa-NL wrote: > Hi Willy, > > I thought lets give those reg-test another try :) as its easy to run and > dev3 just came out. > All tests pass on my FreeBSD system, except this one, new reg-test attached. > > Pretty much the same test as previously send, but now with only 4 x 10 > connections. Which should be fine for conntrack and sysctls (i hope..). It > seems those stats numbers are 'off', or is my expected value not as fixed as > i thought it would be? Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you spotted a regression that we have to analyse. However, I'd like to merge the fix before merging the regtest otherwise it will kill the reg-test feature until we manage to get the issue fixed! I'm also seeing that you rely on threads, I think I noticed another test involving threads. Probably that we should have a specific directory for these ones that we can disable completely when threads are not enabled, otherwise this will also destroy tests (and make them extremely slow due to varnishtest waiting for the timeout if haproxy refuses to parse the config). I think that we should think a bit forward based on these tests. We must not let varnishtest stop on the first error but rather just log it. Then at the end we could produce a report of successes and failures that would be easy to diff from the previous (or expected) one. That will be particularly useful when running the tests on older releases. As an example, I had to run your test manually on 1.8 because for I-don't-know- what-reason, the one about the proxy protocol now fails while it used to work fine last week for the 1.8.14 release. That's a shame that we can't complete tests just because one randomly fails. Thanks, Willy
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Willy, I thought lets give those reg-test another try :) as its easy to run and dev3 just came out. All tests pass on my FreeBSD system, except this one, new reg-test attached. Pretty much the same test as previously send, but now with only 4 x 10 connections. Which should be fine for conntrack and sysctls (i hope..). It seems those stats numbers are 'off', or is my expected value not as fixed as i thought it would be? Tested with: HA-Proxy version 1.9-dev3-27010f0 2018/09/29 FreeBSD freebsd11 11.1-RELEASE Results: h1 0.0 CLI recv|CumConns: 33 h1 0.0 CLI recv|CumReq: 65 h1 0.0 CLI expect failed ~ "CumConns: 41" If my 'expect' is correct, would the patch be suitable for inclusion with the other reg-tests this way? If you want to rename loadtest, to heavytest, or any other tweaks please feel free to do so. Regards, PiBa-NL (Pieter) Op 20-9-2018 om 22:25 schreef PiBa-NL: Hi Willy, Op 20-9-2018 om 13:56 schreef Willy Tarreau: For me the test produces like 345 lines of output as attached. which seems not to bad (if the test succeeds). It's already far too much for a user. Well those 345 lines are if it succeeds while in 'verbose' mode, in 'normal' mode it only produces 1 line of output when successful. Pretty much all tests produce 100+ lines of 'logging' if they fail for some reason. From what ive seen varnishtest either produces a bulk of logging on a failure, or it only logs the failures. There isn't much in between. As for all the rest of the email, thanks for your elaborate response :). Regards, PiBa-NL (Pieter) From 28377ffe246ed1db0e0d898fa6263eccdc68c490 Mon Sep 17 00:00:00 2001 From: PiBa-NL Date: Sat, 15 Sep 2018 01:51:54 +0200 Subject: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters after running a 4 x 10 looping requests --- reg-tests/loadtest/b0-loadtest.vtc | 99 ++ 1 file changed, 99 insertions(+) create mode 100644 reg-tests/loadtest/b0-loadtest.vtc diff --git a/reg-tests/loadtest/b0-loadtest.vtc b/reg-tests/loadtest/b0-loadtest.vtc new file mode 100644 index ..590924e1 --- /dev/null +++ b/reg-tests/loadtest/b0-loadtest.vtc @@ -0,0 +1,99 @@ +# Checks that request and connection counters are properly kept + +varnishtest "Connection counters check" +feature ignore_unknown_macro + +server s1 { +rxreq +expect req.http.TESTsize == 10 +txresp +} -repeat 4 -start + +syslog Slg_1 -level notice { +recv +} -repeat 15 -start + +haproxy h1 -W -conf { + global +nbthread 3 +log ${Slg_1_addr}:${Slg_1_port} local0 +#nokqueue + + defaults +mode http +option dontlog-normal +log global +option httplog +timeout connect 3s +timeout client 4s +timeout server 15s + + frontend fe1 +bind "fd@${fe_1}" +acl donelooping hdr(TEST) -m len 10 +http-request set-header TEST "%[hdr(TEST)]x" +use_backend b2 if donelooping +default_backend b1 + + backend b1 +server srv1 ${h1_fe_1_addr}:${h1_fe_1_port} + + frontend fe2 +bind "fd@${fe_2}" +default_backend b2 + + backend b2 +# haproxy 1.8 does not have the ,length converter. +#acl OK hdr(TEST) -m len 10 +#http-request deny deny_status 200 if OK +#http-request deny deny_status 400 + +# haproxy 1.9 does have a ,length converter. +http-request set-header TESTsize "%[hdr(TEST),length]" +http-request del-header TEST +server srv2 ${s1_addr}:${s1_port} + +} -start + +barrier b1 cond 4 + +client c1 -connect ${h1_fe_1_sock} { + timeout 17 + barrier b1 sync +txreq -url "/" +rxresp +expect resp.status == 200 +} -start +client c2 -connect ${h1_fe_1_sock} { + timeout 17 + barrier b1 sync +txreq -url "/" +rxresp +expect resp.status == 200 +} -start +client c3 -connect ${h1_fe_1_sock} { + timeout 17 + barrier b1 sync +txreq -url "/" +rxresp +expect resp.status == 200 +} -start +client c4 -connect ${h1_fe_1_sock} { + timeout 17 + barrier b1 sync +txreq -url "/" +rxresp +expect resp.status == 200 +} -start + +client c1 -wait +client c2 -wait +client c3 -wait +client c4 -wait + +haproxy h1 -cli { +send "show info" +expect ~ "CumConns: 41" +send "show info" +expect ~ "CumReq: 42" +} -- 2.18.0.windows.1
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Willy, Op 20-9-2018 om 13:56 schreef Willy Tarreau: For me the test produces like 345 lines of output as attached. which seems not to bad (if the test succeeds). It's already far too much for a user. Well those 345 lines are if it succeeds while in 'verbose' mode, in 'normal' mode it only produces 1 line of output when successful. Pretty much all tests produce 100+ lines of 'logging' if they fail for some reason. From what ive seen varnishtest either produces a bulk of logging on a failure, or it only logs the failures. There isn't much in between. As for all the rest of the email, thanks for your elaborate response :). Regards, PiBa-NL (Pieter)
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Pieter, On Thu, Sep 20, 2018 at 12:48:35AM +0200, PiBa-NL wrote: > Test takes like 5 seconds to run here, and while that is a bit long if you > get a hundred more similar tests and want to continue tweaking developments > while running tests in between. It wouldn't hurt to run such a (series) of > longer tests before creating a patch and submitting it for inclusion on the > official git repository in my opinion, or before a release. Definitely, however those run before a release should be almost 100% system-agnostic. Having to prepare the system and tune it properly for the test not to fail is going to cause a headache every time there is a failure because it will mean fixing the root cause and re-run the whole suite, which precisely is what will make all this suite not to be used at all anymore. This is why I'm really picky on the reliability and the speed of these tests. It should be stupid-proof, with me being the stupid (and believe me when it comes to doing repetitive stuff, I'm among the stupidest persons you have ever met). > My attempt was > to test a bit differently than just looking for regressions of known fixed > bugs, and putting a little load on haproxy so that threads and simultaneous > actions 'might' get into conflicts/locks/stuff which might by chance, show > up, which is why i choose to go a little higher on the number of round-trips > with ever slightly increasing payload.. I really understand the point and I think it is valid to a certain extent. But that's really not a thing to run by default. And I want to encourage us (including me) to run reg tests from time to time. If you know that some of them will take too long, you'll quickly end up avoiding all the ones you can easily avoid using a single command (eg: playing with the LEVEL variable, or not running it at all). > For me the test produces like 345 lines of output as attached. which seems > not to bad (if the test succeeds). It's already far too much for a user. I should only know if it works or not, otherwise it hides the output of all other ones (which is what happened). We must not have to parse the output to know if we didn't break anything, we just have to check that it looks normal. Here's what make reg-tests gives me on 1.8 : willy@wtap:haproxy$ time sh make-reg-tests-1.8 #top TEST reg-tests/lua/b2.vtc passed (0.159) #top TEST reg-tests/lua/b1.vtc passed (0.122) #top TEST reg-tests/lua/b0.vtc passed (0.110) #top TEST reg-tests/lua/b3.vtc passed (0.137) #top TEST reg-tests/connection/b0.vtc passed (0.172) #top TEST reg-tests/server/b0.vtc passed (0.110) #top TEST reg-tests/spoe/b0.vtc passed (0.008) #top TEST reg-tests/ssl/b0.vtc passed (0.139) #top TEST reg-tests/stick-table/b1.vtc passed (0.110) #top TEST reg-tests/stick-table/b0.vtc passed (0.110) #top TEST reg-tests/log/b0.vtc passed (0.125) #top TEST reg-tests/seamless-reload/b0.vtc passed (0.123) real0m1.713s user0m0.316s sys 0m0.068s As you can see there's no output to parse, it *visually* looks correct. > Besides the 2 instances of cli output > for stats, its seems not that much different from other tests.. > And with 1.8.13 on FreeBSD (without qkueue) it succeeds: # top TEST > ./test/b0-loadtest.vtc passed (4.800 OK then you get a valid output there. It's here that it's ugly. But we spend enough time analysing bugs, I really refuse to spend extra time fixing bugs in tests supposed to help detect bugs, otherwise it becomes recursive... > Taking into account conntrack and ulimit, would that mean we can never > 'reg-test' if haproxy can really handle like 1 connections without > issue? 10k conns definitely is way beyond what you can expect from a non-root user on a regular shell. I run most of my configs with "maxconn 400" because that's less than 1024 FDs once you add the extra FDs for listeners and checks. Anything beyond that will depend on the users' setup and becomes tricky. And in this case it's more a matter of stress-testing the system, and we can have stress-test procedures or tools (just like we all do on our respective setups with different tools). It's just that one has to know in advance that some preparation is needed (typically killing a possible browser, unloading some modules, checking that there's still memory left, maybe adding some addresses to the loopback, etc). So it's a different approach and it definitely is out of the scope of automatizing detection of potential regressions during development. > Or should the environment be configured by the test?? ,that seems > very tricky at least and probably would be undesirable.. No definitely it would be even worse. For sure those used to run "curl | sudo bash" will have no problem letting it configure their system, but I'm not among such people and I appreciate a lot that my machine works every morning when I want to
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Willy, Op 19-9-2018 om 7:36 schreef Willy Tarreau: Hi Pieter, I took some time this morning to give it a test. For now it fails here, after dumping 2200 lines of not really usable output that I didn't investigate. From what I'm seeing it seems to moderately stress the local machine so it has many reasons for failing (lack of source ports, improperly tuned conntrack, ulimit, etc), and it takes far too long a time to be usable as a default test, or this one alone will be enough to discourage anyone from regularly running "make reg-tests". Test takes like 5 seconds to run here, and while that is a bit long if you get a hundred more similar tests and want to continue tweaking developments while running tests in between. It wouldn't hurt to run such a (series) of longer tests before creating a patch and submitting it for inclusion on the official git repository in my opinion, or before a release.?. My attempt was to test a bit differently than just looking for regressions of known fixed bugs, and putting a little load on haproxy so that threads and simultaneous actions 'might' get into conflicts/locks/stuff which might by chance, show up, which is why i choose to go a little higher on the number of round-trips with ever slightly increasing payload.. For me the test produces like 345 lines of output as attached. which seems not to bad (if the test succeeds).. Besides the 2 instances of cli output for stats, its seems not that much different from other tests.. And with 1.8.13 on FreeBSD (without qkueue) it succeeds: # top TEST ./test/b0-loadtest.vtc passed (4.800 Taking into account conntrack and ulimit, would that mean we can never 'reg-test' if haproxy can really handle like 1 connections without issue? Or should the environment be configured by the test?? ,that seems very tricky at least and probably would be undesirable.. (i just today figured i could run reg-tests also on my production box to compare if a new build showed issues there that my test-box might not.. i wouldn't want system settings to changed by a reg-test run..) I think we should create a distinct category for such tests Agreed, which is why i used the currently non-existing '/loadtest/' folder. If '/heavy/' is better thats of course alright for me to. , because I see some value in it when it's made to reproduce a very specific class of issues which is very unlikely to trigger unless someone is working on it. In this case it is not a problem that it dumps a lot of output, as it will be useful for the person knowing what to look for there. Probably that such tests should be run by hand and dump their log into a related file. Imagine for example that we would have this : $ make reg-tests/heavy/conn-counter-3000-req.log I'm not exactly sure..("make: don't know how to make reg-tests. Stop"). i would still like to have a way to run all 'applicable' tests with 1 command, even if it takes a hour or so to verify haproxy is working 'perfectly'. But like abns@ tests cant work on FreeBSD, but they should not 'fail', perhaps get skipped automatically though.?. Anyhow thats a question for my other mail-topic ( https://www.mail-archive.com/haproxy@formilux.org/msg31195.html ) It would run the test on reg-tests/heavy/conn-counter-3000-req.vtc and would produce the log into reg-tests/heavy/conn-counter-3000-req.log. We could use a similar thing to test for select/poll/epoll/kqueue, to test for timeouts, race conditions (eg show sess in threads). This is very likely something to brainstorm about. You might have other ideas related to certain issues you faced in the past. Fred is unavailable this week but I'd be very interested in his opinion on such things. Thus for now I'm not applying your patch, but I'm interested in seeing what can be done with it. Okay no problem :) , ill keep running this particular test myself for the moment, it 'should' be able to pass normally.. (On my environment anyhow..) Thanks, Willy Thanks for your comments, and thoughts. I'm interested in Fred's and anyone elses opinion ;) , and well maybe this particular test-case could be replaced by something simpler/faster/ with more or less the same likelihood of catching yet unknown issues..? Looking forward to reactions :) . Thanks and regards, PiBa-NL (Pieter) top 0.0 extmacro def pwd=/usr/ports/net/haproxy-devel top 0.0 extmacro def localhost=127.0.0.1 top 0.0 extmacro def bad_backend=127.0.0.1 58530 top 0.0 extmacro def bad_ip=192.0.2.255 top 0.0 macro def testdir=/usr/ports/net/haproxy-devel/./test top 0.0 macro def tmpdir=/tmp/vtc.35996.290f74a9 *top 0.0 TEST ./test/b0-loadtest.vtc starting ** top 0.0 === varnishtest "Seamless reload issue with abns sockets" *top 0.0 TEST Seamless reload issue with abns sockets ** top 0.0 === feature ignore_unknown_macro ** top 0.0 === server s1 { ** s10.0 Starting server s10.0 macro def
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Pieter, On Sat, Sep 15, 2018 at 02:03:45AM +0200, PiBa-NL wrote: > Hi List, Willy, > > I've created a regtest that checks that when concurrent connections are > being handled that the connection counters are kept properly. > > I think it could be committed as attached. It takes a few seconds to run. It > currently fails on 1.9-dev2 (also fails on 1.8.13 with kqueue on FreeBSD, > adding a 'nokqueue' on 1.8.13 makes it succeed though..). > > I think it might be a good and reproducible test to run.? > > Or does it need more tweaking? Thoughts appreciated :). I took some time this morning to give it a test. For now it fails here, after dumping 2200 lines of not really usable output that I didn't investigate. From what I'm seeing it seems to moderately stress the local machine so it has many reasons for failing (lack of source ports, improperly tuned conntrack, ulimit, etc), and it takes far too long a time to be usable as a default test, or this one alone will be enough to discourage anyone from regularly running "make reg-tests". I think we should create a distinct category for such tests, because I see some value in it when it's made to reproduce a very specific class of issues which is very unlikely to trigger unless someone is working on it. In this case it is not a problem that it dumps a lot of output, as it will be useful for the person knowing what to look for there. Probably that such tests should be run by hand and dump their log into a related file. Imagine for example that we would have this : $ make reg-tests/heavy/conn-counter-3000-req.log It would run the test on reg-tests/heavy/conn-counter-3000-req.vtc and would produce the log into reg-tests/heavy/conn-counter-3000-req.log. We could use a similar thing to test for select/poll/epoll/kqueue, to test for timeouts, race conditions (eg show sess in threads). This is very likely something to brainstorm about. You might have other ideas related to certain issues you faced in the past. Fred is unavailable this week but I'd be very interested in his opinion on such things. Thus for now I'm not applying your patch, but I'm interested in seeing what can be done with it. Thanks, Willy