Re: [PATCH] DOC: clarify force-private-cache is an option
On Mon, Oct 01, 2018 at 02:00:16AM +0200, Lukas Tribus wrote: > "boolean" may confuse users into thinking they need to provide > additional arguments, like false or true. This is a simple option > like many others, so lets not confuse the users with internals. > > Also fixes an additional typo. > > Should be backported to 1.8 and 1.7. Applied, thank you Lukas. Willy
[PATCH] DOC: clarify force-private-cache is an option
"boolean" may confuse users into thinking they need to provide additional arguments, like false or true. This is a simple option like many others, so lets not confuse the users with internals. Also fixes an additional typo. Should be backported to 1.8 and 1.7. --- doc/configuration.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/configuration.txt b/doc/configuration.txt index 336ef1f..d890b0b 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -1660,7 +1660,7 @@ tune.ssl.cachesize this value to 0 disables the SSL session cache. tune.ssl.force-private-cache - This boolean disables SSL session cache sharing between all processes. It + This option disables SSL session cache sharing between all processes. It should normally not be used since it will force many renegotiations due to clients hitting a random process. But it may be required on some operating systems where none of the SSL cache synchronization method may be used. In @@ -6592,7 +6592,7 @@ option smtpchk yes |no| yes | yes Arguments : is an optional argument. It is the "hello" command to use. It can - be either "HELO" (for SMTP) or "EHLO" (for ESTMP). All other + be either "HELO" (for SMTP) or "EHLO" (for ESMTP). All other values will be turned into the default command ("HELO"). is the domain name to present to the server. It may only be -- 2.7.4
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Willy, Op 30-9-2018 om 20:38 schreef Willy Tarreau: On Sun, Sep 30, 2018 at 08:22:23PM +0200, Willy Tarreau wrote: On Sun, Sep 30, 2018 at 07:59:34PM +0200, PiBa-NL wrote: Indeed it works with 1.8, so in that regard i 'think' the test itself is correct.. Also when disabling threads, or running only 1 client, it still works.. Then both CumConns CumReq show 11 for the first stats result. Hmmm for me it fails even without threads. That was the first thing I tried when meeting the error in fact. But I need to dig deeper. So I'm seeing that in fact the count is correct if the server connection closes first, and wrong otherwise. In fact it fails similarly both for 1.6, 1.7, 1.8 and 1.9 with and without threads. I'm seeing that the connection count is exactly 10 times the incoming connections while the request count is exactly 20 times this count. I suspect that what happens is that the request count is increased on each connection when preparing to receive a new request. This even slightly reminds me something but I don't know where I noticed something like this, I think I saw this when reviewing the changes needed to be made to HTTP for the native internal representation. So I think it's a minor bug, but not a regression. Thanks, Willy Not sure, only difference between 100x FAILED and 100x OK is the version here. Command executed and result below. Perhaps that's just because of the OS / Scheduler used though, i assume your using some linux distro to test with, perhaps that explains part of the differences between your and my results.. In the end it doesn't matter much if its a bug or a regression still needs a fix ;). And well i don't know if its just the counter thats wrong, or there might be bigger consequences somewhere. if its just the counter then i guess it wouldn't hurt much to postpone a fix to a next (dev?) version. Regards, PiBa-NL (Pieter) root@freebsd11:/usr/ports/net/haproxy-devel # varnishtest -q -n 100 -j 16 -k ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc ... # top TEST ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc FAILED (0.128) exit=2 # top TEST ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc FAILED (0.135) exit=2 100 tests failed, 0 tests skipped, 0 tests passed root@freebsd11:/usr/ports/net/haproxy-devel # haproxy -v HA-Proxy version 1.9-dev3-27010f0 2018/09/29 Copyright 2000-2018 Willy Tarreau root@freebsd11:/usr/ports/net/haproxy-devel # pkg add -f haproxy-1.8.14-selfbuild-reg-tests-OK.txz Installing haproxy-1.8... package haproxy is already installed, forced install Extracting haproxy-1.8: 100% root@freebsd11:/usr/ports/net/haproxy-devel # varnishtest -q -n 100 -j 16 -k ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc 0 tests failed, 0 tests skipped, 100 tests passed root@freebsd11:/usr/ports/net/haproxy-devel # haproxy -v HA-Proxy version 1.8.14-52e4d43 2018/09/20 Copyright 2000-2018 Willy Tarreau
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
On Sun, Sep 30, 2018 at 08:22:23PM +0200, Willy Tarreau wrote: > On Sun, Sep 30, 2018 at 07:59:34PM +0200, PiBa-NL wrote: > > Indeed it works with 1.8, so in that regard i 'think' the test itself is > > correct.. Also when disabling threads, or running only 1 client, it still > > works.. Then both CumConns CumReq show 11 for the first stats result. > > Hmmm for me it fails even without threads. That was the first thing I > tried when meeting the error in fact. But I need to dig deeper. So I'm seeing that in fact the count is correct if the server connection closes first, and wrong otherwise. In fact it fails similarly both for 1.6, 1.7, 1.8 and 1.9 with and without threads. I'm seeing that the connection count is exactly 10 times the incoming connections while the request count is exactly 20 times this count. I suspect that what happens is that the request count is increased on each connection when preparing to receive a new request. This even slightly reminds me something but I don't know where I noticed something like this, I think I saw this when reviewing the changes needed to be made to HTTP for the native internal representation. So I think it's a minor bug, but not a regression. Thanks, Willy
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
On Sun, Sep 30, 2018 at 07:59:34PM +0200, PiBa-NL wrote: > Indeed it works with 1.8, so in that regard i 'think' the test itself is > correct.. Also when disabling threads, or running only 1 client, it still > works.. Then both CumConns CumReq show 11 for the first stats result. Hmmm for me it fails even without threads. That was the first thing I tried when meeting the error in fact. But I need to dig deeper. > > However, I'd like to merge > > the fix before merging the regtest otherwise it will kill the reg-test > > feature until we manage to get the issue fixed! > I'm not fully sure i agree on that.. While i understand that failing > reg-tests can be a pita while developing (if you run them regulary) the fact > is that currently existing tests can already already start to fail after > some major redesign of the code, a few mails back (different mailthread) i > tested like 10 commits in a row and they all suffered from different failing > tests, that would imho not be a reason to remove those tests, and they didnt > stop development. The reason is that for now we have no way to let the tests fail gracefully and report what is OK and what is not. So any error that's in the way will lead to an absolutely certain behaviour from everyone : nobody will run the tests anymore since the result will be known. Don't get me wrong, I'm willing to get as many tests as we can, but 1) we have to be sure these tests only fail for regressions and not for other reasons, and 2) we must be sure that these tests do not prevent other ones from being run nor make it impossible to observe the progress on other ones. We're still at the beginning with reg tests, and as you can see we have not even yet sorted out the requirements for some of them like threads or Lua or whatever else. I'm just asking that we don't create tests faster than we can sort them out, that's all. This probably means that we really have to work on these two main areas which are test prerequisites and synthetic reports of what worked and what failed. Ideas and proposals on this are welcome, but to be honest I can't spend as much time as I'd want on this for now given how late we are on all what remains to be done, so I really welcome discussions and help on the subject between the various actors. Thanks, Willy
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
On Sun, Sep 30, 2018 at 07:15:59PM +0200, PiBa-NL wrote: > > on a simple config, the CummConns always matches the CumReq, and when > > running this test I'm seeing random values there in the output, but I > > also see that they are retrieved before all connections are closed > But CurrConns is 0, so connections are (supposed to be?) closed? : > > h1 0.0 CLI recv|CurrConns: 0 > h1 0.0 CLI recv|CumConns: 27 > h1 0.0 CLI recv|CumReq: 27 You're totally right, I think I confused CumConns and CurrConns when looking at the output. With that said I have no idea what's going on, I'll have another look. Thanks, Willy
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Willy, Op 30-9-2018 om 7:46 schreef Willy Tarreau: Hi Pieter, On Sun, Sep 30, 2018 at 12:05:14AM +0200, PiBa-NL wrote: Hi Willy, I thought lets give those reg-test another try :) as its easy to run and dev3 just came out. All tests pass on my FreeBSD system, except this one, new reg-test attached. Pretty much the same test as previously send, but now with only 4 x 10 connections. Which should be fine for conntrack and sysctls (i hope..). It seems those stats numbers are 'off', or is my expected value not as fixed as i thought it would be? Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you spotted a regression that we have to analyse. Indeed it works with 1.8, so in that regard i 'think' the test itself is correct.. Also when disabling threads, or running only 1 client, it still works.. Then both CumConns CumReq show 11 for the first stats result. However, I'd like to merge the fix before merging the regtest otherwise it will kill the reg-test feature until we manage to get the issue fixed! I'm not fully sure i agree on that.. While i understand that failing reg-tests can be a pita while developing (if you run them regulary) the fact is that currently existing tests can already already start to fail after some major redesign of the code, a few mails back (different mailthread) i tested like 10 commits in a row and they all suffered from different failing tests, that would imho not be a reason to remove those tests, and they didnt stop development. I'm also seeing that you rely on threads, I think I noticed another test involving threads. Probably that we should have a specific directory for these ones that we can disable completely when threads are not enabled, otherwise this will also destroy tests (and make them extremely slow due to varnishtest waiting for the timeout if haproxy refuses to parse the config). A specific directory will imho not work. How should it be called? /threaded_lua_with_ssl_using_kqueue_scheduler_on_freebsd_without_absn_for_haproxy_1.9_and_higher/ ? Having varnishtest fail while waiting for a feature that was not compiled is indeed undesirable as well. So some 'smart' way of defining 'requirements' for a test will be needed so they can gracefully skip if not applicable.. I'm not sure myself how that way should look though.. On one side i think the .vtc itself might be the place to define what requirements it has, on the other the other a separate list/script including logic of what tests to run could be nice.. But then who is going to maintain that one.. I think that we should think a bit forward based on these tests. We must not let varnishtest stop on the first error but rather just log it. varnishtest can continue on error with -k Using this little mytest.sh script at the moment, this runs all tests and only failed tests produce a lot of logging..: haproxy -v varnishtest -j 16 -k -t 20 ./work/haproxy-*/reg-tests/*/*.vtc > ./mytest-result.log 2>&1 varnishtest -j 16 -k -t 20 ./haproxy_test_OK_20180831/*/*.vtc >> ./mytest-result.log 2>&1 cat ./mytest-result.log echo "" >> ./mytest-result.log haproxy -vv >> ./mytest-result.log There is also the -q parameter, but then it doesn't log anymore what tests passed and would only the failed tests will produce 1 log line.. (i do like to log what tests where executed though..) Then at the end we could produce a report of successes and failures that would be easy to diff from the previous (or expected) one. That will be particularly useful when running the tests on older releases. As an example, I had to run your test manually on 1.8 because for I-don't-know- what-reason, the one about the proxy protocol now fails while it used to work fine last week for the 1.8.14 release. That's a shame that we can't complete tests just because one randomly fails. You can continue tests. ( -k ) But better write it out to a logfile then, or perhaps combine with -l which leaves the /tmp/.vtc folder.. Thanks, Willy Regards, PiBa-NL (Pieter)
Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
Hi Willy, Op 30-9-2018 om 7:56 schreef Willy Tarreau: On Sun, Sep 30, 2018 at 07:46:24AM +0200, Willy Tarreau wrote: Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you spotted a regression that we have to analyse. However, I'd like to merge the fix before merging the regtest otherwise it will kill the reg-test feature until we manage to get the issue fixed! By the way, could you please explain in simple words the issue you've noticed ? I tried to reverse the vtc file but I don't understand the details nor what it tries to achieve. When I'm running a simple test on a simple config, the CummConns always matches the CumReq, and when running this test I'm seeing random values there in the output, but I also see that they are retrieved before all connections are closed But CurrConns is 0, so connections are (supposed to be?) closed? : h1 0.0 CLI recv|CurrConns: 0 h1 0.0 CLI recv|CumConns: 27 h1 0.0 CLI recv|CumReq: 27 , so I'm not even sure the test is correct :-/ Thanks, Willy What i'm trying to achieve is, well.. testing for regressions that are not yet known to exist on the current stable version. So what this test does in short: It makes 4 clients simultaneously send a request to a threaded haproxy, which in turn connects 10x backend to frontend and then sends the request to the s1 server. This with the intended purpose of having several connections started and broken up as fast as haproxy can process them while trying to have a high probability of adding/removing items from lists/counters from different threads thus possibly creating problems if some lock/sync isn't done correctly. After firing a few requests it also verifies the expected counts, and results where possible.. History: Ive been bit a few times with older releases by corruption occurring inside the POST data when uploading large (500MB+) files to a server behind haproxy. After a few megabytes are passed correctly the resulting file would contain differences from their original when compared, the upload 'seemed' to succeed though. (this was then solved by installing a newer haproxy build..).. Also sometimes threads have locked up or crashed things. Or kqueue scheduler turned out to behave differently than others.. Ive been trying to test such things manually but found i always forget to run some test. This is why i really like the concept of having a set of defined tests that validate haproxy is working 'properly', on the OS i run it on.. Also when some issue i ran into gets fixed i tend to run -dev builds on my production environment for a while, and well its nice to know that other functionality still works as it used to.. With writing this test i initially started with the idea of automatically testing a large file transfer through haproxy, but then thought where / how to leave such a file, so i thought of transferring a 'large' header with increasing size 'might' trigger a similar condition.. Though in hindsight that might not actually test the same code paths.. I created that test with 1 byte growth in the header together with 4000 connections didn't quite achieve that initial big file simulation, but still i thought it ended up to be a nice test. So submitted it a while back ;) .. Anyhow haproxy wasn't capable of doing much when dev2 was tagged so i wasnt to worried the test failed at that time.. And you announced dev2 as such as well, so that was okay. And perhaps the issue found then would solve itself when further fixes on top of dev2 were added ;). Anyhow with dev3 i hoped all regressions would be fixed, and found this one still failed on 1.9dev3. So it tuned the numbers in the previous submitted regtest down a little to avoid conntrack/sysctl default limits, while still failing the test 'reliably'.. I'm not sure what exactly is going on, or how bad it is that these numbers don't match up anymore.. Maybe its only the counter thats not updated in a thread safe way, perhaps there is a bigger issue lurking with sync points and whatnot..? Either way the test should pass as i understand it, the 4 defined varnish clients got their answer back and Currconns = 0, also adding a 3 second delay between waiting for the clients and checking the stats does not fix it... And as youve checked with 1.8 it does pass. Though that to could perhaps be a coincidence, maybe now things are processed even faster now but in different order so the test fails for the wrong reason.?. Hope that makes some sense in my thought process :). Regards, PiBa-NL (Pieter)
Re: Allow configuration of pcre-config path
On Sun, Sep 30, 2018 at 03:54:14PM +0200, Fabrice Fontaine wrote: > OK, thanks for your quick review, see attached patch, I made two variables > PCRE_CONFIG and PCRE2_CONFIG. Thank you, now applied. Willy
Re: Allow configuration of pcre-config path
Dear Willy, Le dim. 30 sept. 2018 à 14:38, Willy Tarreau a écrit : > Hello Fabrice, > > On Sun, Sep 30, 2018 at 12:20:55PM +0200, Fabrice Fontaine wrote: > > Dear all, > > > > I added haproxy to buildroot and to do so, I added a way of configuring > the > > path of pcre-config and pcre-config2. > > This looks OK however I think from a users' perspective that it would be > better to let the user specify the path to the pcre-config command instead > of only the directory containing it. This gives more flexibility, for > example allowing to have a different name than "pcre-config". Maybe you > could simply call that variable "PCRE_CONFIG" in this case. > OK, thanks for your quick review, see attached patch, I made two variables PCRE_CONFIG and PCRE2_CONFIG. > > > So, please find attached a patch. As > > this my first contribution to haproxy, please excuse me if I made any > > mistakes. > > It's mostly OK. Please prefix the subject line with "BUILD:" so that we > know it affects the build system (just run "git log Makefile" to see > what we do), but that's just a cosmetic detail. > OK done. > > Thanks, > Willy > Best Regards, Fabrice From 658cd370c3fa90484bfee1c493b7dd9c0248ac57 Mon Sep 17 00:00:00 2001 From: Fabrice Fontaine Date: Fri, 28 Sep 2018 19:21:26 +0200 Subject: [PATCH] BUILD: Allow configuration of pcre-config path Add PCRE_CONFIG and PCRE2_CONFIG variables to allow the user to configure path of pcre-config or pcre2-config instead of using the one in his path. This is particulary useful when cross-compiling. Signed-off-by: Fabrice Fontaine --- Makefile | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index 382f944f..074e0169 100644 --- a/Makefile +++ b/Makefile @@ -78,9 +78,13 @@ # Other variables : # DLMALLOC_SRC : build with dlmalloc, indicate the location of dlmalloc.c. # DLMALLOC_THRES : should match PAGE_SIZE on every platform (default: 4096). +# PCRE_CONFIG: force the binary path to get pcre config (by default +# pcre-config) # PCREDIR: force the path to libpcre. # PCRE_LIB : force the lib path to libpcre (defaults to $PCREDIR/lib). # PCRE_INC : force the include path to libpcre ($PCREDIR/inc) +# PCRE2_CONFIG : force the binary path to get pcre2 config (by default +# pcre2-config) # SSL_LIB: force the lib path to libssl/libcrypto # SSL_INC: force the include path to libssl/libcrypto # LUA_LIB: force the lib path to lua @@ -734,7 +738,8 @@ endif # Forcing PCREDIR to an empty string will let the compiler use the default # locations. -PCREDIR := $(shell pcre-config --prefix 2>/dev/null || echo /usr/local) +PCRE_CONFIG := pcre-config +PCREDIR := $(shell $(PCRE_CONFIG) --prefix 2>/dev/null || echo /usr/local) ifneq ($(PCREDIR),) PCRE_INC:= $(PCREDIR)/include PCRE_LIB:= $(PCREDIR)/lib @@ -759,7 +764,8 @@ endif endif ifneq ($(USE_PCRE2)$(USE_STATIC_PCRE2)$(USE_PCRE2_JIT),) -PCRE2DIR := $(shell pcre2-config --prefix 2>/dev/null || echo /usr/local) +PCRE2_CONFIG := pcre2-config +PCRE2DIR := $(shell $(PCRE2_CONFIG) --prefix 2>/dev/null || echo /usr/local) ifneq ($(PCRE2DIR),) PCRE2_INC := $(PCRE2DIR)/include PCRE2_LIB := $(PCRE2DIR)/lib @@ -777,7 +783,7 @@ endif endif -PCRE2_LDFLAGS := $(shell pcre2-config --libs$(PCRE2_WIDTH) 2>/dev/null || echo -L/usr/local/lib -lpcre2-$(PCRE2_WIDTH)) +PCRE2_LDFLAGS := $(shell $(PCRE2_CONFIG) --libs$(PCRE2_WIDTH) 2>/dev/null || echo -L/usr/local/lib -lpcre2-$(PCRE2_WIDTH)) ifeq ($(PCRE2_LDFLAGS),) $(error libpcre2-$(PCRE2_WIDTH) not found) -- 2.17.1
Re: Allow configuration of pcre-config path
Hello Fabrice, On Sun, Sep 30, 2018 at 12:20:55PM +0200, Fabrice Fontaine wrote: > Dear all, > > I added haproxy to buildroot and to do so, I added a way of configuring the > path of pcre-config and pcre-config2. This looks OK however I think from a users' perspective that it would be better to let the user specify the path to the pcre-config command instead of only the directory containing it. This gives more flexibility, for example allowing to have a different name than "pcre-config". Maybe you could simply call that variable "PCRE_CONFIG" in this case. > So, please find attached a patch. As > this my first contribution to haproxy, please excuse me if I made any > mistakes. It's mostly OK. Please prefix the subject line with "BUILD:" so that we know it affects the build system (just run "git log Makefile" to see what we do), but that's just a cosmetic detail. Thanks, Willy
Re: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
On Sun, Sep 30, 2018 at 02:35:24PM +0300, Ciprian Dorin Craciun wrote: > One question about this: if the client gradually reads from the > (server side) buffer, but it doesn't completely clears it, having this > `TCP_USER_TIMEOUT` configured would consider this connection "live"? yes, that's it. > More specifically, say there is 4MB in the server buffer and the > client "consumes" (i.e. acknowledges) only small parts of it, would > the timeout apply as: > (A) until the entire buffer is cleared, or > (B) until at least "some" amount of data is read; The timeout is an inactivity period. So let's say you set 10s in tcp-ut, it would only kill the connection if the client acks nothing in 10s, even if it takes 3 minutes to dump the whole buffer. It's mostly used in environments with very long connections where clients may disappear without warning, such as websocket connections or webmails. Willy
Re: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
On Sun, Sep 30, 2018 at 2:22 PM Willy Tarreau wrote: > > As seen the timeout which I believe is the culprit is the `timeout > > client 30s` which I guess is quite enough. > > I tend to consider that if the response starts to be sent, > then the most expensive part was done and it'd better be completed > otherwise the client will try again and inflict the same cost to the > server again. I prefer shorter timeout values because on the server side I have uWSGI with Python, and with its default model (one process / request at one time), having long outstanding connections could degrade the user experience. > You should probably increase this enough so that you > don't see unexpected timeouts anymore, and rely on tcp-ut to cut early > if a client doesn't read the data. One question about this: if the client gradually reads from the (server side) buffer, but it doesn't completely clears it, having this `TCP_USER_TIMEOUT` configured would consider this connection "live"? More specifically, say there is 4MB in the server buffer and the client "consumes" (i.e. acknowledges) only small parts of it, would the timeout apply as: (A) until the entire buffer is cleared, or (B) until at least "some" amount of data is read; Thanks, Ciprian.
Re: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
On Sun, Sep 30, 2018 at 12:23:20PM +0300, Ciprian Dorin Craciun wrote: > On Sun, Sep 30, 2018 at 12:12 PM Willy Tarreau wrote: > > > Anyway, why am I trying to configure the sending buffer size: if I > > > have large downloads and I have (some) slow clients, and as a > > > consequence HAProxy times out waiting for the kernel buffer to clear. > > > > Thus you might have very short timeouts! Usually it's not supposed to > > be an issue. > > I wouldn't say they are "small": > > timeout server 60s > timeout server-fin 6s > timeout client 30s > timeout client-fin 6s > timeout tunnel 180s > timeout connect 6s > timeout queue 30s > timeout check 6s > timeout tarpit 30s > > > As seen the timeout which I believe is the culprit is the `timeout > client 30s` which I guess is quite enough. It's enough for a 2 Mbps bandwidth on the client, not for less. I don't see the point is setting too short timeouts on the client side for data transfers, I tend to consider that if the response starts to be sent, then the most expensive part was done and it'd better be completed otherwise the client will try again and inflict the same cost to the server again. You should probably increase this enough so that you don't see unexpected timeouts anymore, and rely on tcp-ut to cut early if a client doesn't read the data. Willy
Allow configuration of pcre-config path
Dear all, I added haproxy to buildroot and to do so, I added a way of configuring the path of pcre-config and pcre-config2. So, please find attached a patch. As this my first contribution to haproxy, please excuse me if I made any mistakes. Best Regards, Fabrice From f3dcdf6c9ffea4d9b89dca9706a48c44bd76c470 Mon Sep 17 00:00:00 2001 From: Fabrice Fontaine Date: Fri, 28 Sep 2018 19:21:26 +0200 Subject: [PATCH] Allow configuration of pcre-config path Add PCRE_CONFIGDIR variable to allow the user to configure path of pcre-config or pcre-config2 instead of using the one in his path. This is particular useful when cross-compiling. Signed-off-by: Fabrice Fontaine --- Makefile | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index 382f944f..7c31f1ba 100644 --- a/Makefile +++ b/Makefile @@ -78,6 +78,7 @@ # Other variables : # DLMALLOC_SRC : build with dlmalloc, indicate the location of dlmalloc.c. # DLMALLOC_THRES : should match PAGE_SIZE on every platform (default: 4096). +# PCRE_CONFIGDIR : force the path to pcre-config or pcre-config2 # PCREDIR: force the path to libpcre. # PCRE_LIB : force the lib path to libpcre (defaults to $PCREDIR/lib). # PCRE_INC : force the include path to libpcre ($PCREDIR/inc) @@ -734,7 +735,7 @@ endif # Forcing PCREDIR to an empty string will let the compiler use the default # locations. -PCREDIR := $(shell pcre-config --prefix 2>/dev/null || echo /usr/local) +PCREDIR := $(shell $(PCRE_CONFIGDIR)pcre-config --prefix 2>/dev/null || echo /usr/local) ifneq ($(PCREDIR),) PCRE_INC:= $(PCREDIR)/include PCRE_LIB:= $(PCREDIR)/lib @@ -759,7 +760,7 @@ endif endif ifneq ($(USE_PCRE2)$(USE_STATIC_PCRE2)$(USE_PCRE2_JIT),) -PCRE2DIR := $(shell pcre2-config --prefix 2>/dev/null || echo /usr/local) +PCRE2DIR := $(shell $(PCRE_CONFIGDIR)pcre2-config --prefix 2>/dev/null || echo /usr/local) ifneq ($(PCRE2DIR),) PCRE2_INC := $(PCRE2DIR)/include PCRE2_LIB := $(PCRE2DIR)/lib @@ -777,7 +778,7 @@ endif endif -PCRE2_LDFLAGS := $(shell pcre2-config --libs$(PCRE2_WIDTH) 2>/dev/null || echo -L/usr/local/lib -lpcre2-$(PCRE2_WIDTH)) +PCRE2_LDFLAGS := $(shell $(PCRE_CONFIGDIR)pcre2-config --libs$(PCRE2_WIDTH) 2>/dev/null || echo -L/usr/local/lib -lpcre2-$(PCRE2_WIDTH)) ifeq ($(PCRE2_LDFLAGS),) $(error libpcre2-$(PCRE2_WIDTH) not found) -- 2.17.1
Re: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
On Sun, Sep 30, 2018 at 12:12 PM Willy Tarreau wrote: > > If so then by not setting it the kernel should choose the default > > value, which according to: > > > > > sysctl net.ipv4.tcp_wmem > > net.ipv4.tcp_wmem = 409616384 4194304 > > > > , should be 16384. > > No, it *starts* at 16384 then grows up to the configured limit depending > on the ability to do so without losses and the available memory. OK. The Linux man-page eludes this part... Good to know. :) > > Anyway, why am I trying to configure the sending buffer size: if I > > have large downloads and I have (some) slow clients, and as a > > consequence HAProxy times out waiting for the kernel buffer to clear. > > Thus you might have very short timeouts! Usually it's not supposed to > be an issue. I wouldn't say they are "small": timeout server 60s timeout server-fin 6s timeout client 30s timeout client-fin 6s timeout tunnel 180s timeout connect 6s timeout queue 30s timeout check 6s timeout tarpit 30s As seen the timeout which I believe is the culprit is the `timeout client 30s` which I guess is quite enough. > > However if I configure the buffer size small enough it seems HAProxy > > is "kept bussy" and nothing breaks. > > I see but then maybe you should simply lower the tcp_wmem max value a > little bit, or increase your timeout ? I'll try to experiment with `tcp_wmem max` as you've suggested. > > Thus, is there a way to have both OK bandwidth for normal clients, and > > not timeout for slow clients? > > That's exactly the role of the TCP stack. It measures RTT and losses and > adjusts the send window accordingly. You must definitely let the TCP > stack play its role there, you'll have much less problems. Even if you > keep 4 MB as the max send window, for a 1 Mbps client that's rougly 40 > seconds of transfer. You can deal with this using much larger timeouts > (1 or 2 minutes), and configure the tcp-ut value on the bind line to > get rid of clients which do not ACK the data they're being sent at the > TCP level. I initially let the TCP "do its thing", but it got me into trouble with poor wireless clients... I'll also give `tcp-ut` as suggested. Thanks, Ciprian.
Re: [ANNOUNCE] haproxy-1.9-dev3
Hi Willy. Am 30.09.2018 um 11:05 schrieb Willy Tarreau: > Hi Aleks, > > On Sun, Sep 30, 2018 at 10:38:20AM +0200, Aleksandar Lazic wrote: >> Do you have any release date for 1.9, as I plan to launch some new site and >> thought to use 1.9 from beginning because it sounds like that 1.9 will be >> able >> to handle h2 with the backend. > > It's initially planned for end of October/early November, but I think we'll > stretch the months a little bit. The extremely difficult part is the rework > of the HTTP engine to migrate to the native internal representation which > is needed to transport H2 semantics from end to end. While a huge amount > of work has been done on this, it also uncovered some very old design > heritage that needed to be replaced and that takes time to address, such > as the changes to logging and error snapshots to make them work out of > streams, or the change of connection orientation which we initially expected > to postpone after 1.9 but that we discovered late is mandatory to finish the > work, and the change of the idle connections that's needed to maintain > keep-alive on the backend side. > > These changes have a huge impact on the code and the architecture, so as > per the technical vs functional release cycle, I'd really want to have > this in 1.9 so that we have all the basis for much cleaner and calmer > development for 2.0. But I'm sure we will face yet more surprises. > > Thus if we see that it's definitely not workable to complete these changes > by ~November, we'll possibly release without them but will put all of them > in a -next branch that we'll merge soon after the release. However if we > manage to have something almost done, I'm willing to push the deadline a > little bit further to let this be finished. Christopher suggested that we > might have a 3rd option which is to have the two implementations side by > side and that we decide by configuration which one to use depending on > the desired features. That's indeed an option (a temporary one) but I > don't like it much due to the risk of increased complexity with bug > reports. That's still definitely something to keep in mind anyway. I agree here with you. > I sincerely hope it's the last time we engage in such complex changes in a > single version! I got caught several years ago during the 1.5 development > and this time it's even more complex than what we had to redesign by then! Well when I think back to 2003 haproxy is now complete different, cool evolution ;-) > Hoping this clarifies the situation a bit, Yes definitely. I will start with 1.8 just to be on the save site. Thank you for your always detailed answer. ;-) > Willy Best regards Aleks
Re: [ANNOUNCE] haproxy-1.9-dev3
Hi Aleks, On Sun, Sep 30, 2018 at 10:38:20AM +0200, Aleksandar Lazic wrote: > Do you have any release date for 1.9, as I plan to launch some new site and > thought to use 1.9 from beginning because it sounds like that 1.9 will be able > to handle h2 with the backend. It's initially planned for end of October/early November, but I think we'll stretch the months a little bit. The extremely difficult part is the rework of the HTTP engine to migrate to the native internal representation which is needed to transport H2 semantics from end to end. While a huge amount of work has been done on this, it also uncovered some very old design heritage that needed to be replaced and that takes time to address, such as the changes to logging and error snapshots to make them work out of streams, or the change of connection orientation which we initially expected to postpone after 1.9 but that we discovered late is mandatory to finish the work, and the change of the idle connections that's needed to maintain keep-alive on the backend side. These changes have a huge impact on the code and the architecture, so as per the technical vs functional release cycle, I'd really want to have this in 1.9 so that we have all the basis for much cleaner and calmer development for 2.0. But I'm sure we will face yet more surprises. Thus if we see that it's definitely not workable to complete these changes by ~November, we'll possibly release without them but will put all of them in a -next branch that we'll merge soon after the release. However if we manage to have something almost done, I'm willing to push the deadline a little bit further to let this be finished. Christopher suggested that we might have a 3rd option which is to have the two implementations side by side and that we decide by configuration which one to use depending on the desired features. That's indeed an option (a temporary one) but I don't like it much due to the risk of increased complexity with bug reports. That's still definitely something to keep in mind anyway. I sincerely hope it's the last time we engage in such complex changes in a single version! I got caught several years ago during the 1.5 development and this time it's even more complex than what we had to redesign by then! Hoping this clarifies the situation a bit, Willy
Re: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
On Sun, Sep 30, 2018 at 11:41 AM Ciprian Dorin Craciun wrote: > > - tune.sndbuf.client 16384 allows you to have 16384 bytes "on-the-fly", > > meaning unacknowlegded. 16384 / 0.16 sec = roughly 128 KB/s > > - do the math with your value of 131072 and you will have get your ~800 > > KB/s. However, something bothers me... Setting `tune.sndbuf.client`, is used only to call `setsockopt (SO_SNDBUF)`, right? It is not used by HAProxy for any internal buffer size? If so then by not setting it the kernel should choose the default value, which according to: > sysctl net.ipv4.tcp_wmem net.ipv4.tcp_wmem = 409616384 4194304 , should be 16384. Looking with `netstat` at the `Recv-Q` column, it seems that with no `tune` setting the value even goes up to 5 MB. However setting the `tune` parameter it always goes up to around 20 KB. Anyway, why am I trying to configure the sending buffer size: if I have large downloads and I have (some) slow clients, and as a consequence HAProxy times out waiting for the kernel buffer to clear. However if I configure the buffer size small enough it seems HAProxy is "kept bussy" and nothing breaks. Thus, is there a way to have both OK bandwidth for normal clients, and not timeout for slow clients? Thanks, Ciprian.
Re: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
On Sun, Sep 30, 2018 at 11:33 AM Mathias Weiersmüller wrote: > Sorry for the extremly brief answer: > - you mentioned you have 160 ms latency. Yes, I have mentioned this because I've read somewhere (not remembering now where), that the `SO_SNDBUF` socket option also impacts the TCP window size. > - tune.sndbuf.client 16384 allows you to have 16384 bytes "on-the-fly", > meaning unacknowlegded. 16384 / 0.16 sec = roughly 128 KB/s > - do the math with your value of 131072 and you will have get your ~800 KB/s. > - no hidden voodoo happening here: read about BDP (Bandwidth Delay Product) Please don't get me wrong: I didn't imply any "voodoo". :) When I asked if there is some "hidden" consequence I didn't meant it as "magic", but as a question for what other (unknown to me) consequences there are. And it seems that the `tune.sndbuf.client` also limits the TCP window size. So my question is how can I (if at all possible) configure the buffer size witout "breaking" the TCP window size? Thanks, Ciprian.
Re: [ANNOUNCE] haproxy-1.9-dev3
Am 29.09.2018 um 20:41 schrieb Willy Tarreau: > Subject: [ANNOUNCE] haproxy-1.9-dev3 > To: haproxy@formilux.org > > Hi, > > Now that Kernel Recipes is over (it was another awesome edition), I'm back > to my haproxy activities. Well, I was pleased to see that my coworkers > reserved me a nice surprise by fixing the pending bugs that were plaguing > dev2. I should go to conferences more often, maybe it's a message from > them to make me understand I'm disturbing them when I'm at the office ;-) ;-) > So I thought that it was a good opportunity to issue dev3 now and make it > what dev2 should have been, and forget that miserable one, eventhough I > was told that I'll soon get another batch of patches to merge, but then > we'll simply emit dev4 so there's no need to further delay pending fixes. > > HAProxy 1.9-dev3 was released on 2018/09/29. It added 35 new commits > after version 1.9-dev2. > > There's nothing fancy here. The connection issues are supposedly addressed > (please expect a bit more in this area soon). The HTTP/1 generic parser is > getting smarter since we're reimplementing the features that were in the > old HTTP code (content-length and transfer-encoding now handled). Lua now > can access stick-tables. I haven't checked precisely how but I saw that > Adis updated the doc so all info should be there. > > Ah, a small change is that we now build with -Wextra after having addressed > all warnings reported up to gcc 7.3 and filtered a few useless ones. If you > get some build warnings, please report them along with your gcc version and > your build options. I personally build with -Werror in addition to this one, > and would like to keep this principle to catch certain bugs or new compiler > jokes earlier in the future. > > As usual, this is an early development version. It's fine if you want to > test the changes, but avoid putting this into production if it can cost > you your job! Do you have any release date for 1.9, as I plan to launch some new site and thought to use 1.9 from beginning because it sounds like that 1.9 will be able to handle h2 with the backend. > Please find the usual URLs below : >Site index : http://www.haproxy.org/ >Discourse: http://discourse.haproxy.org/ >Sources : http://www.haproxy.org/download/1.9/src/ >Git repository : http://git.haproxy.org/git/haproxy.git/ >Git Web browsing : http://git.haproxy.org/?p=haproxy.git >Changelog: http://www.haproxy.org/download/1.9/src/CHANGELOG >Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/ Docker Image is updated. https://hub.docker.com/r/me2digital/haproxy19/ > Willy Regards Aleks > --- > Complete changelog : > Adis Nezirovic (1): > MEDIUM: lua: Add stick table support for Lua. > > Bertrand Jacquin (1): > DOC: Fix typos in lua documentation > > Christopher Faulet (3): > MINOR: h1: Add H1_MF_XFER_LEN flag > BUG/MEDIUM: h1: Really skip all updates when incomplete messages are > parsed > BUG/MEDIUM: http: Don't parse chunked body if there is no input data > > Dragan Dosen (1): > BUG/MEDIUM: patterns: fix possible double free when reloading a pattern > list > > Moemen MHEDHBI (1): > DOC: Update configuration doc about the maximum number of stick > counters. > > Olivier Houchard (4): > BUG/MEDIUM: process_stream: Don't use si_cs_io_cb() in process_stream(). > MINOR: h2/stream_interface: Reintroduce te wake() method. > BUG/MEDIUM: h2: Wake the task instead of calling h2_recv()/h2_process(). > BUG/MEDIUM: process_stream(): Don't wake the task if no new data was > received. > > Willy Tarreau (24): > BUG/MINOR: h1: don't consider the status for each header > MINOR: h1: report in the h1m struct if the HTTP version is 1.1 or above > MINOR: h1: parse the Connection header field > MINOR: http: add http_hdr_del() to remove a header from a list > MINOR: h1: add headers to the list after controls, not before > MEDIUM: h1: better handle transfer-encoding vs content-length > MEDIUM: h1: deduplicate the content-length header > CLEANUP/CONTRIB: hpack: remove some h1 build warnings > BUG/MINOR: tools: fix set_net_port() / set_host_port() on IPv4 > BUG/MINOR: cli: make sure the "getsock" command is only called on > connections > MINOR: stktable: provide an unchecked version of stktable_data_ptr() > MINOR: stream-int: make si_appctx() never fail > BUILD: ssl_sock: remove build warnings on potential null-derefs > BUILD: stats: remove build warnings on potential null-derefs > BUILD: stream: address null-deref build warnings at -Wextra > BUILD: http: address a couple of null-deref warnings at -Wextra > BUILD: log: silent build warnings due to unchecked > __objt_{server,applet} > BUILD: dns: fix null-deref build warning at -Wextra > BUILD: checks: silence a null-deref build warning at
AW: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
However the bandwidth behaviour is exactly the same: * no `tune.sndbuf.client`, bandwidth goes up to 11 MB/s for a large download; * with `tune.sndbuf.client 16384` it goes up to ~110 KB/s; * with `tune.sndbuf.client 131072` it goes up to ~800 KB/s; * with `tune.sndbuf.client 262144` it goes up to ~1400 KB/s; (These are bandwidths obtained after the TCP window has "settled".) It seems there is a liniar correlation between that tune parameter and the bandwidth. However due to the fact that I get the same behaviour both with and without offloading, I wonder if there isn't somehow a "hidden" consequence of setting this `tune.sndbuf.client` parameter? == Sorry for the extremly brief answer: - you mentioned you have 160 ms latency. - tune.sndbuf.client 16384 allows you to have 16384 bytes "on-the-fly", meaning unacknowlegded. 16384 / 0.16 sec = roughly 128 KB/s - do the math with your value of 131072 and you will have get your ~800 KB/s. - no hidden voodoo happening here: read about BDP (Bandwidth Delay Product) Cheers Matti
Re: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
On Sun, Sep 30, 2018 at 10:35 AM Willy Tarreau wrote: > Note that these are not fragments but segments. And as Matti suggested, > it's indeed due to GSO, you're seeing two TCP frames sent at once through > the stack, and they will be segmented by the NIC. I have disabled all offloading features: tcp-segmentation-offload: off generic-segmentation-offload: off generic-receive-offload: off Now I see "as expected" Ethernet frames with `tcpdump` / `Wireshark`. (There is indeed however a bump in kernel CPU usage.) However the bandwidth behaviour is exactly the same: * no `tune.sndbuf.client`, bandwidth goes up to 11 MB/s for a large download; * with `tune.sndbuf.client 16384` it goes up to ~110 KB/s; * with `tune.sndbuf.client 131072` it goes up to ~800 KB/s; * with `tune.sndbuf.client 262144` it goes up to ~1400 KB/s; (These are bandwidths obtained after the TCP window has "settled".) It seems there is a liniar correlation between that tune parameter and the bandwidth. However due to the fact that I get the same behaviour both with and without offloading, I wonder if there isn't somehow a "hidden" consequence of setting this `tune.sndbuf.client` parameter? Thanks, Ciprian.
Re: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
On Sun, Sep 30, 2018 at 10:35 AM Willy Tarreau wrote: > On Sun, Sep 30, 2018 at 10:20:06AM +0300, Ciprian Dorin Craciun wrote: > > I was just trying to replicate the issue I've seen yesterday, and for > > a moment (in initial tests) I was able to. However on repeated tests > > it seems that the `tune.rcvbuf.*` (and related) have no impact, as I > > constantly see TCP fragments (around 2842 bytes Ethernet frames). > > Note that these are not fragments but segments. And as Matti suggested, > it's indeed due to GSO, you're seeing two TCP frames sent at once through > the stack, and they will be segmented by the NIC. [Just as info.] So it seems I was able to reproduce the bandwith issue by only toying with `tune.sndbuf.client`: * with no value, downloading an 8 MB file I get decent bandwidth around 4 MB/s; (for larger files I even get up to 10 MB/s); (a typical Ethernet frame length as reported by `tcpdump` is around 59 KB towards the end of the transfer;) * with that tune parameter set to 128 KB, I get around 1 MB/s; (a typical Ethernet frame length is around 4 KB;) * with that tune parameter set to 16 KB, I get around 100 KB/s; (a typical Ethernet frame lengh is around 2KB;) By "typical Ethernet frame length" I meen a packet as reported by `tcpdump` and viewed in Wrieshark looks like this (for the first one): Frame 1078: 59750 bytes on wire (478000 bits), 59750 bytes captured (478000 bits) Encapsulation type: Ethernet (1) Arrival Time: Sep 30, 2018 10:26:58.667739000 EEST [Time shift for this packet: 0.0 seconds] Epoch Time: 1538292418.667739000 seconds [Time delta from previous captured frame: 0.18000 seconds] [Time delta from previous displayed frame: 0.18000 seconds] [Time since reference or first frame: 1.901135000 seconds] Frame Number: 1078 Frame Length: 59750 bytes (478000 bits) Capture Length: 59750 bytes (478000 bits) [Frame is marked: False] [Frame is ignored: False] [Protocols in frame: eth:ethertype:ip:tcp:ssl:ssl] [Coloring Rule Name: TCP] [Coloring Rule String: tcp] Ethernet II, Src: f2:3c:91:9f:51:b8 (f2:3c:91:9f:51:b8), Dst: Cisco_9f:f0:0a (00:00:0c:9f:f0:0a) Destination: Cisco_9f:f0:0a (00:00:0c:9f:f0:0a) Source: f2:3c:91:9f:51:b8 (f2:3c:91:9f:51:b8) Type: IPv4 (0x0800) Internet Protocol Version 4, Src: XXX.XXX.XXX.XXX, Dst: XXX.XXX.XXX.XXX 0100 = Version: 4 0101 = Header Length: 20 bytes (5) Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) 00.. = Differentiated Services Codepoint: Default (0) ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0) Total Length: 59736 Identification: 0x8d7a (36218) Flags: 0x4000, Don't fragment 0... = Reserved bit: Not set .1.. = Don't fragment: Set ..0. = More fragments: Not set ...0 = Fragment offset: 0 Time to live: 64 Protocol: TCP (6) Header checksum: 0x3054 [validation disabled] [Header checksum status: Unverified] Source: XXX.XXX.XXX.XXX Destination: XXX.XXX.XXX.XXX Transmission Control Protocol, Src Port: 443, Dst Port: 38150, Seq: 8271805, Ack: 471, Len: 59684 Source Port: 443 Destination Port: 38150 [Stream index: 0] [TCP Segment Len: 59684] Sequence number: 8271805(relative sequence number) [Next sequence number: 8331489(relative sequence number)] Acknowledgment number: 471(relative ack number) 1000 = Header Length: 32 bytes (8) Flags: 0x010 (ACK) Window size value: 234 [Calculated window size: 29952] [Window size scaling factor: 128] Checksum: 0x7d1c [unverified] [Checksum Status: Unverified] Urgent pointer: 0 Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps [SEQ/ACK analysis] [Timestamps] TCP payload (59684 bytes) TCP segment data (16095 bytes) TCP segment data (10779 bytes) [2 Reassembled TCP Segments (16405 bytes): #1073(310), #1078(16095)] [Frame: 1073, payload: 0-309 (310 bytes)] [Frame: 1078, payload: 310-16404 (16095 bytes)] [Segment count: 2] [Reassembled TCP length: 16405] Secure Sockets Layer Secure Sockets Layer I'll try to disable offloading and see what happens. I forgot to say that this is a paravirtualized VM running on Linode in their Dallas datacenter. Ciprian.
Re: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
On Sun, Sep 30, 2018 at 10:20:06AM +0300, Ciprian Dorin Craciun wrote: > On Sun, Sep 30, 2018 at 10:06 AM Mathias Weiersmüller > wrote: > > I am pretty sure you have TCP segmentation offload enabled. The TCP/IP > > stack therefore sends bigger-than-allowed TCP segments towards the NIC who > > in turn takes care about the proper segmentation. > > I was just trying to replicate the issue I've seen yesterday, and for > a moment (in initial tests) I was able to. However on repeated tests > it seems that the `tune.rcvbuf.*` (and related) have no impact, as I > constantly see TCP fragments (around 2842 bytes Ethernet frames). Note that these are not fragments but segments. And as Matti suggested, it's indeed due to GSO, you're seeing two TCP frames sent at once through the stack, and they will be segmented by the NIC. > > You want to check the output of "ethtool -k eth0" and the values of: > > tcp-segmentation-offload > > generic-segmentation-offload > > The output of `ethtool -k eth0` is bellow: > > tcp-segmentation-offload: on > tx-tcp-segmentation: on > tx-tcp-ecn-segmentation: on > tx-tcp-mangleid-segmentation: off > tx-tcp6-segmentation: on > generic-segmentation-offload: on > Indeed. Willy
Re: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
On Sun, Sep 30, 2018 at 07:06:29AM +, Mathias Weiersmüller wrote: > I am pretty sure you have TCP segmentation offload enabled. The TCP/IP stack > therefore sends bigger-than-allowed TCP segments towards the NIC who in turn > takes care about the proper segmentation. > > You want to check the output of "ethtool -k eth0" and the values of: > tcp-segmentation-offload > generic-segmentation-offload Yep totally agreed, as soon as you have either GSO or TSO, you will see large frames. Ciprian in this case it's better to capture from another machine in the path to get a reliable capture. You can also disable TSO/GSO using ethtool -K, but be prepared to see a significant bump in CPU usage. Don't do this if you are already running above 20% CPU usage on average. Willy
Re: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
On Sun, Sep 30, 2018 at 10:06 AM Mathias Weiersmüller wrote: > I am pretty sure you have TCP segmentation offload enabled. The TCP/IP stack > therefore sends bigger-than-allowed TCP segments towards the NIC who in turn > takes care about the proper segmentation. I was just trying to replicate the issue I've seen yesterday, and for a moment (in initial tests) I was able to. However on repeated tests it seems that the `tune.rcvbuf.*` (and related) have no impact, as I constantly see TCP fragments (around 2842 bytes Ethernet frames). > You want to check the output of "ethtool -k eth0" and the values of: > tcp-segmentation-offload > generic-segmentation-offload The output of `ethtool -k eth0` is bellow: tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: on tx-tcp-mangleid-segmentation: off tx-tcp6-segmentation: on generic-segmentation-offload: on Thanks, Ciprian.
AW: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
I am pretty sure you have TCP segmentation offload enabled. The TCP/IP stack therefore sends bigger-than-allowed TCP segments towards the NIC who in turn takes care about the proper segmentation. You want to check the output of "ethtool -k eth0" and the values of: tcp-segmentation-offload generic-segmentation-offload Cheers Mathias -Ursprüngliche Nachricht- Von: Ciprian Dorin Craciun Gesendet: Sonntag, 30. September 2018 08:30 An: w...@1wt.eu Cc: haproxy@formilux.org Betreff: Re: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation? On Sun, Sep 30, 2018 at 9:08 AM Willy Tarreau wrote: > > I've played with `tune.rcvbuf.server`, `tune.sndbuf.server`, > > `tune.rcvbuf.client`, and `tune.sndbuf.client` and explicitly set > > them to various values ranging from 4k to 256k. Unfortunately in > > all cases it seems that this generates too large TCP packets (larger > > than the advertised and agreed MSS in both direction), which in turn > > leads to TCP fragmentation and reassembly. (Both client and server > > are Linux > > >4.10. The protocol used was HTTP 1.1 over TLS 1.2.) > > No no no, I'm sorry but this is not possible at all. You will never > find a single TCP stack doing this! I'm pretty sure there is an issue > somewhere in your capture or analysis. > > [...] > > However, if the problem you're experiencing is only with the listening > side, there's an "mss" parameter that you can set on your "bind" lines > to enforce a lower MSS, it may be a workaround in your case. I'm > personally using it at home to reduce the latency over ADSL ;-) I am also extreemly sckeptical that this is HAProxy's fault, however the only change needed to eliminate this issue was commenting-out these tune arguments. I have also explicitly set the `mss` parameter to `1400`. The catpure was taken directly on the server on public interface. I'll try to make a fresh catpure to see if I can replicate this. > > The resulting bandwidth was around 10 MB. > > Please use correct units when reporting issues, in order to reduce the > confusion. "10 MB" is not a bandwidth but a size (10 megabytes). Most > likely you want to mean 10 megabytes per second (10 MB/s). But maybe > you even mean 10 megabits per second (10 Mb/s or 10 Mbps), which > equals > 1.25 MB/s. :) Sorry for that. (Thats the otucome of writing emails at 3 AM after 4 hours of pocking into a production system.) I completely agree with you about the MB/Mb consistency, and I always hate that some providers still use MB to mean mega-bits, like it's 2000. :) Yes, I meant 10 mega-bytes / second. Sory again. Ciprian.
Re: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
On Sun, Sep 30, 2018 at 9:08 AM Willy Tarreau wrote: > > I've played with `tune.rcvbuf.server`, `tune.sndbuf.server`, > > `tune.rcvbuf.client`, and `tune.sndbuf.client` and explicitly set them > > to various values ranging from 4k to 256k. Unfortunately in all cases > > it seems that this generates too large TCP packets (larger than the > > advertised and agreed MSS in both direction), which in turn leads to > > TCP fragmentation and reassembly. (Both client and server are Linux > > >4.10. The protocol used was HTTP 1.1 over TLS 1.2.) > > No no no, I'm sorry but this is not possible at all. You will never find > a single TCP stack doing this! I'm pretty sure there is an issue somewhere > in your capture or analysis. > > [...] > > However, if the problem you're experiencing is only with the listening > side, there's an "mss" parameter that you can set on your "bind" lines > to enforce a lower MSS, it may be a workaround in your case. I'm > personally using it at home to reduce the latency over ADSL ;-) I am also extreemly sckeptical that this is HAProxy's fault, however the only change needed to eliminate this issue was commenting-out these tune arguments. I have also explicitly set the `mss` parameter to `1400`. The catpure was taken directly on the server on public interface. I'll try to make a fresh catpure to see if I can replicate this. > > The resulting bandwidth was around 10 MB. > > Please use correct units when reporting issues, in order to reduce the > confusion. "10 MB" is not a bandwidth but a size (10 megabytes). Most > likely you want to mean 10 megabytes per second (10 MB/s). But maybe > you even mean 10 megabits per second (10 Mb/s or 10 Mbps), which equals > 1.25 MB/s. :) Sorry for that. (Thats the otucome of writing emails at 3 AM after 4 hours of pocking into a production system.) I completely agree with you about the MB/Mb consistency, and I always hate that some providers still use MB to mean mega-bits, like it's 2000. :) Yes, I meant 10 mega-bytes / second. Sory again. Ciprian.
Re: Do `tune.rcvbuf.server` and `tune.sndbuf.server` (and their `tune.*.client` equivalents) lead to TCP fragmentation?
Hi Ciprian, On Sat, Sep 29, 2018 at 09:57:20PM +0300, Ciprian Dorin Craciun wrote: > Hello all! > > I've played with `tune.rcvbuf.server`, `tune.sndbuf.server`, > `tune.rcvbuf.client`, and `tune.sndbuf.client` and explicitly set them > to various values ranging from 4k to 256k. Unfortunately in all cases > it seems that this generates too large TCP packets (larger than the > advertised and agreed MSS in both direction), which in turn leads to > TCP fragmentation and reassembly. (Both client and server are Linux > >4.10. The protocol used was HTTP 1.1 over TLS 1.2.) No no no, I'm sorry but this is not possible at all. You will never find a single TCP stack doing this! I'm pretty sure there is an issue somewhere in your capture or analysis. MSS is the maximum segment size and corresponds to the maximum *payload* transported over TCP. It doesn't include the IP nor TCP headers. Usually over Ethernet it's 1460, resulting in 1500 bytes packets. If you're seeing fragments, it very likely is due to an intermediary router or firewall which has a shorter MTU at some point, such as an IPSEC VPN, IP tunnel or ADSL link, and which must fragment to deliver the data. Some such equipments are capable of interfering with the MSS negociation to reduce it to fit the MTU reduction, you need to check on the affected equipments. Also, regarding your initial question, tune.rcvbuf/sndbuf will have no effect on all this since they only specify the extra buffer size in the system. However, if the problem you're experiencing is only with the listening side, there's an "mss" parameter that you can set on your "bind" lines to enforce a lower MSS, it may be a workaround in your case. I'm personally using it at home to reduce the latency over ADSL ;-) > The resulting bandwidth was around 10 MB. Please use correct units when reporting issues, in order to reduce the confusion. "10 MB" is not a bandwidth but a size (10 megabytes). Most likely you want to mean 10 megabytes per second (10 MB/s). But maybe you even mean 10 megabits per second (10 Mb/s or 10 Mbps), which equals 1.25 MB/s. Regards, Willy