Re: Slower responses from me starting now
On 2023-06-02 18:44, Willy Tarreau wrote: Hi all, with 2.8 released and a nice weather here, I decided to take a few weeks of holidays (I think last time was in september 2016 so I don't remember how it feels). No travel plans in sight and mostly hacking stuff at home, catching up with kernel stuff I postponed due to the upcoming release, etc, so I'll continue to look at my messages from time to time but with no real frenezy, so keep this in mind if you send me a direct message and don't get a response, and preferably contact the list or the subsystem maintainers for questions, patches etc, this will be much more efficient. And if I get more messages than I can handle, it's not completely impossible I'll proceed like Linus: delete all on return ;-) Christopher and Amaury just verified they can produce releases if needed, so you're in good hands. Cheers, Willy Enjoy your holidays and rest! :) -- Regards, Christian Ruppert
Re: Debugging PH state
On 2023-03-13 15:50, Christopher Faulet wrote: Le 3/6/23 à 08:50, Christian Ruppert a écrit : On 2023-03-03 16:37, Christopher Faulet wrote: Le 3/3/23 à 15:40, Christian Ruppert a écrit : So I can reproduce it. I captured the response, extracted the data which includes to entire header + payload. I put that into a file like foo.bin and I start a "nc" that just serves that one response on every request. It works fine, no 500, no PH. Even with the same request so I really don't get it. I really wonder if that's a HAProxy bug. Hi Christian, It is indeed possible. Could you share your configuration ? Is there any special in the response ? Hi Christopher, I can share the config off-list. I don't see anything special, neither in the request nor the response. Do you know where the 500/PH is being set in the source by chance? Perhaps I can add some additional debugging there. If it's a invalid response, I'd guess that Apache would already block that from the actual backend (Apache is in front of it). The Response has a small header, small body. XML. Too bad I cannot get that 500 on every request/response :( Hi Christian, Sorry for the delay, just back from vacation. So sure, you can send me your configuration off-list. If you have a simple reproducer, share it too. It will be easier to understand what happens. In the code, you can find rewrite errors by grepping on "failed_rewrites". It a frontend/backend/listener/server counter. A rewrite error may happen every time haproxy tries to alter the request or the response. Hi Christopher, no problem! I've been busy with other stuff as well. The customer decided to not dig into it any further, since it happens sometimes only and only in some specific cases. I can still try to get a copy of the relevant parts of the config anyway, perhaps also including one of those responses. But as I said, I simulated it via "nc" and replied with the same response that failed but anything was fine. So I don't have anything to trigger that one for sure :/ -- Regards, Christian Ruppert
Re: Debugging PH state
On 2023-03-03 16:37, Christopher Faulet wrote: Le 3/3/23 à 15:40, Christian Ruppert a écrit : So I can reproduce it. I captured the response, extracted the data which includes to entire header + payload. I put that into a file like foo.bin and I start a "nc" that just serves that one response on every request. It works fine, no 500, no PH. Even with the same request so I really don't get it. I really wonder if that's a HAProxy bug. Hi Christian, It is indeed possible. Could you share your configuration ? Is there any special in the response ? Hi Christopher, I can share the config off-list. I don't see anything special, neither in the request nor the response. Do you know where the 500/PH is being set in the source by chance? Perhaps I can add some additional debugging there. If it's a invalid response, I'd guess that Apache would already block that from the actual backend (Apache is in front of it). The Response has a small header, small body. XML. Too bad I cannot get that 500 on every request/response :( -- Regards, Christian Ruppert
Re: Debugging PH state
So I can reproduce it. I captured the response, extracted the data which includes to entire header + payload. I put that into a file like foo.bin and I start a "nc" that just serves that one response on every request. It works fine, no 500, no PH. Even with the same request so I really don't get it. I really wonder if that's a HAProxy bug. -- Regards, Christian Ruppert
Debugging PH state
0.36 2020-12-04 PCRE2 library supports JIT : no (USE_PCRE2_JIT not set) Encrypted password support via crypt(3): yes Built with gcc compiler version 10.2.1 20210110 Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available multiplexer protocols : (protocols marked as cannot be specified using 'proto' keyword) h2 : mode=HTTP side=FE|BE mux=H2flags=HTX|HOL_RISK|NO_UPG fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG : mode=HTTP side=FE|BE mux=H1flags=HTX h1 : mode=HTTP side=FE|BE mux=H1flags=HTX|NO_UPG : mode=TCP side=FE|BE mux=PASS flags= none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG Available services : prometheus-exporter Available filters : [BWLIM] bwlim-in [BWLIM] bwlim-out [CACHE] cache [COMP] compression [FCGI] fcgi-app [SPOE] spoe [TRACE] trace -- Regards, Christian Ruppert
Re: "Success" logs in HTTP frontends
On 2022-08-19 11:50, Christian Ruppert wrote: On 2022-08-01 09:45, Christian Ruppert wrote: On 2022-07-29 13:59, William Lallemand wrote: On Fri, Jul 29, 2022 at 11:10:32AM +0200, Christopher Faulet wrote: Le 7/29/22 à 10:13, Christian Ruppert a écrit : > Hi list, > > so I noticed on my private HAProxy I have 2 of those logs within the > past ~1-2 months: > haproxy[28669]: 1.2.3.4:48596 [17/Jun/2022:13:55:18.530] public/HTTPSv4: > Success > > So that's nothing so far but still no idea what that means. > At work, of 250 mio log entries per day, there are about 600k of those > "Success" ones. > haproxy[27892]: 192.168.70.102:7904 [29/May/2022:00:13:37.316] > genfrontend_35310-foobar/3: Success > > I'm not sure what it means by "3". Is it the third bind? > > I couldn't trigger those "Success" logs by either restarting or > reloading. What is it for / where does it come from? > Hi Christian, What is your version ? At first glance, I can't find such log message in the code. It could come from a lua module. In fact, I found something. It is probably because an "embryonic" session is killed with no connection/ssl error. For instance, an SSL connection rejected because of a "tcp-request session" rule (so after the SSL handshake). The same may happen with a listener using the PROXY protocol. Regards, Could be something like that indeed, the "Success" message is the string for CO_ER_NONE in the fc_err_str fetch. (The default error string) Maybe we lack some intermediate state, or we could just change the string ? It is only the string for the handshake status so this is confusing when used as an error. Since it's that much every day I'd agree to change/improve it. If it's the connection one then I only see it in combination with SOCKS. There is no SOCKS in my config though, unless that also triggers if something does a SOCKS request on that bind anyway. I wasn't able to reproduce/trigger it that way yet. Does anybody know how to trigger that on purpose? Would be really interesting. So we have one system that does some DNAT stuff, well kind of at least, and triggers around ~700k of those "Success" logs per day. I still couldn't figure out the exact reason. The only hint I have is that it's really mostly that "DNAT" host. Some others also have some but by far not that much. On the same day, the others are all between 0 and 50. -- Regards, Christian Ruppert
Weird behavior with UNIX sockets
Hey, I've got a kinda strange problem with UNIX sockets. I first thought it's a Varnish issue but it may be actually be a HAProxy one. So I've this attached test config, a static error file and a test script (perl, no third party modules, just a few lines), which I'd like to share only off-list (just mail me / let me know) to prevent script-kiddies from abusing it. So we have those huge timeouts just for this test case. maxconn 500k. The test script will do 200k connections. It will open all 200k connections (verify via netstat/ss or something else) and waits for a /tmp/debug file to exist (just touch it if ready). If it exists, it will send GET / requests as fast as possible with all the connections. It was initially meant to test a connection to Varnish instead. So what happens is what is strange: Often: Some connections/requests will be answered with a 503 by HAProxy. HAProxy will log a "SC" on the backend connection: Aug 25 18:14:20 localhost haproxy[23512]: unix:1 [25/Aug/2022:18:14:20.051] li_udstest be_static_err/ 62/-1/-1/-1/62 503 97 - - SC-- 162032/1/0/0/0 0/0 "GET / HTTP/1.1" Sometimes: The mentioned 503's plus: Aug 25 18:14:19 localhost haproxy[23512]: Connect() failed for backend be_udstest: can't connect to destination unix socket, check backlog size on the server. Also sometimes: All 200's. Anything fine. I'd expect it to behave basically the same every time but that's a completely different behavior in all three cases. And actually I'd like it to only answer 200's :) The process limits are looking good so far: # grep 'open files' /proc/$(pgrep -n haproxy)/limits Max open files182 800 files On another test machine we even have a 30mio hard and 10mio soft limit. So those limits should be enough in any case, actually. This has been tested with 2.6.4, 2.6.3, 2.6.2. The same happens if be_udstest points to e.g. Varnish via UDS. Can you reproduce it? Any idea what may cause it? -- Regards, Christian Ruppertglobal maxconn 50 defaults maxconn 50 timeout client 15m timeout client-fin 15m timeout connect 15m timeout http-request 15m timeout queue 15m timeout http-keep-alive 15m timeout server 15m log 127.0.0.1 len 65535 local0 frontend fe_udstest mode http bind :61610 log global option httplog default_backend be_udstest backend be_udstest mode http server udstest unix@/run/udstest.sock listen li_udstest mode http option httplog bind unix@/run/udstest.sock mode 666 default_backend be_static_err backend be_static_err mode http errorfile 503 /etc/haproxy/static-err.txt HTTP/1.1 200 OK Cache-Control: no-cache Connection: close Content-Type: text/plain Test
Re: "Success" logs in HTTP frontends
On 2022-08-01 09:45, Christian Ruppert wrote: On 2022-07-29 13:59, William Lallemand wrote: On Fri, Jul 29, 2022 at 11:10:32AM +0200, Christopher Faulet wrote: Le 7/29/22 à 10:13, Christian Ruppert a écrit : > Hi list, > > so I noticed on my private HAProxy I have 2 of those logs within the > past ~1-2 months: > haproxy[28669]: 1.2.3.4:48596 [17/Jun/2022:13:55:18.530] public/HTTPSv4: > Success > > So that's nothing so far but still no idea what that means. > At work, of 250 mio log entries per day, there are about 600k of those > "Success" ones. > haproxy[27892]: 192.168.70.102:7904 [29/May/2022:00:13:37.316] > genfrontend_35310-foobar/3: Success > > I'm not sure what it means by "3". Is it the third bind? > > I couldn't trigger those "Success" logs by either restarting or > reloading. What is it for / where does it come from? > Hi Christian, What is your version ? At first glance, I can't find such log message in the code. It could come from a lua module. In fact, I found something. It is probably because an "embryonic" session is killed with no connection/ssl error. For instance, an SSL connection rejected because of a "tcp-request session" rule (so after the SSL handshake). The same may happen with a listener using the PROXY protocol. Regards, Could be something like that indeed, the "Success" message is the string for CO_ER_NONE in the fc_err_str fetch. (The default error string) Maybe we lack some intermediate state, or we could just change the string ? It is only the string for the handshake status so this is confusing when used as an error. Since it's that much every day I'd agree to change/improve it. If it's the connection one then I only see it in combination with SOCKS. There is no SOCKS in my config though, unless that also triggers if something does a SOCKS request on that bind anyway. I wasn't able to reproduce/trigger it that way yet. Does anybody know how to trigger that on purpose? Would be really interesting. -- Regards, Christian Ruppert
Re: lua: Add missed lua 5.4 references
Hey guys, is there any news on this or got this one just lost? I couldn't find a response to it so I assume it just got lost. Or is there anything against it? To bad forwarding doesn't work and since this mail is quite old already: https://www.mail-archive.com/haproxy@formilux.org/msg39689.html -- Regards, Christian Ruppert
Re: how to install on RHEL7 and 8
On 2022-05-26 20:28, Ryan O'Hara wrote: On Wed, May 25, 2022 at 11:15 AM William Lallemand wrote: On Tue, May 24, 2022 at 08:56:14PM +, Alford, Mark wrote: Do you have instruction on the exact library needed to fo the full install on RHEL 7 and RHEL 8 I read the INSTALL doc in the tar ball and the did the make command and it failed because of LUA but lua.2.5.3 is installed Please help Hello, I'm using this thread to launch a call for help about the redhat packaging. I am the maintainer for all the Red Hat and Fedora packages. Feel free to ask questions here on the mailing list or email me directly. We try to document the list of available packages here: https://github.com/haproxy/wiki/wiki/Packages The IUS repository is know to work but only provides packages as far as 2.2. no 2.3, 2.4 or 2.5 are there but I'm seeing an open ticket for the 2.4 here: https://github.com/iusrepo/wishlist/issues/303 Unfortunately nobody ever step up to maintain constantly the upstream releases for redhat/centos like its done for ubuntu/debian on haproxy.debian.net [1]. I try to keep Fedora up to date with latest upstream, but once a release goes into a specific Fedora release (eg. haproxy-2.4 in Fedora 35) I don't update to haproxy-2.5 in that same release. I have in the past and I get angry emails about rebasing to a newer release. I've spoken to Willy about this in the past and we seem to be in agreement on this. RHEL is different. We almost never rebase to a later major release for the lifetime of RHEL. The one exception was when we added haproxy-1.8 to RHSCL (software collections) in RHEL7 since the base RHEL7 had haproxy-1.5 and there were significant features added to the 1.8 release. I get this complaint often for haproxy in RHEL. Keep in mind that RHEL is focused on consistency and stability over a long period of time. I can't stress this enough - it is extremely rare to rebase to a new, major release of haproxy (or anything else) in a major RHEL release. For example, RHEL9 has haproxy-2.4 and will likely always have that version. I do often rebase to newer minor release to pick up bug fixes (eg. haproxy-2.4.8 will be updated to haproxy-2.4.17, but very unlikely to be anything beyond the latest 2.4 release). I understand this is not for everybody. IMHO, if you pick a LTS or even a non-LTS (depending on how long the distro version ist being supported) but keep that close to upstream releases by doing minor bumps, that's totally fine. That way, like you said, users get bug fixes and not just hand picked patches. That's far better I'd say. Maybe it could be done with IUS, its as simple as a pull request on their github for each new release, but someone need to be involve. I'm not a redhat user, but from time to time someone is asking for a redhat package and nothing is really available and maintained outside of the official redhat one. As mentioned elsewhere, COPR is likely the best place for this. It had been awhile since I've used it, but there have been times I did special, unsupported builds in COPR for others to use. Hope this helps. Ryan Links: -- [1] http://haproxy.debian.net -- Regards, Christian Ruppert
Backporting "MEDIUM: mworker: reexec in waitpid mode after successful loading" to 2.4
Hi guys, William, can we please get that "MEDIUM: mworker: reexec in waitpid mode after successful loading" - fab0fdce981149a4e3172f2b81113f505f415595 backported to 2.4? I seem to run into it, at least on one of our 40 LBs. This one is a VM though. It sometimes crashes after each reload. Running 2.5 with fab0fdce981149a4e3172f2b81113f505f415595 seems to fix the issue for me. https://github.com/haproxy/haproxy/commit/fab0fdce981149a4e3172f2b81113f505f415595 -- Regards, Christian Ruppert
Re: stats / "show servers conn" looses counter after reload
On 2021-02-12 12:06, William Dauchy wrote: Hi Christian, On Fri, Feb 12, 2021 at 11:59 AM Christian Ruppert wrote: Is this a bug? Can you confirm this behavior? Is there any other way I could figure out whether a backend is currently in use? unfortunately reload does not recover stats values; it is a known problem; see also https://github.com/haproxy/haproxy/issues/954 Thanks, William! I just commented on that issue. -- Regards, Christian Ruppert
stats / "show servers conn" looses counter after reload
Hi list, I'm not sure if that is intended, to me it looks like a bug. I was trying to figure out if a backend is in use or not so I was looking for the used_cur: echo "show servers conn somebackend_rtmp" | socat stdio /var/run/haproxy.stat # bkname/svname bkid/svid addr port - purge_delay used_cur used_max need_est unsafe_nb safe_nb idle_lim idle_cur idle_per_thr[48] somebackend_rtmp/localhost 615/1 127.0.0.1 50643 - 5000 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I then noticed, that those values doesn't match with what tcpdump says. I tracked it down to be caused by a reload. A testcase: Create a large file with dd or throttle your browsers connection to like modem or something to just keep the connection open Do a "show servers conn" and make sure used_cur is > 0 Reload HAProxy Notice it's 0 even though your download / session continues I tested it with tcp as well as http mode. I also have "expose-fd listeners" in use, in case it matters. Affected are at least 2.2.9 as well as 2.3.5. The stats backend also looses its counter values. Is this a bug? Can you confirm this behavior? Is there any other way I could figure out whether a backend is currently in use? -- Regards, Christian Ruppert
Re: Issues with d13afbcce5e664f9cfe797eee8c527e5fa947f1b (haproxy-2.2) "mux-h1: Don't set CS_FL_EOI too early for protocol upgrade requests"
On 2021-02-10 18:15, Christopher Faulet wrote: Le 08/02/2021 à 14:31, Christian Ruppert a écrit : Hi list, Christopher, we're having issues with the mentioned commit / patch: d13afbcce5e664f9cfe797eee8c527e5fa947f1b https://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=d13afbcce5e664f9cfe797eee8c527e5fa947f1b I can also reproduce it with 2.2.9 as well as 2.3.5. I don't have any useful details yet, just the our Jira fails to load. A curl against the site seams to work fine while browser requests (chrome / firefox) seem to timeout or at least some. See the attached log. The first 3 requests seem to be fine so far. Then, much later, there's a 504 between more 200s. I'm not sure yet why the other 200s there seem to wait / are logged after the actual timeout happens. According to chrome's F12 there are more requests still pending. Ignore the 503 there. That seems to be an unrelated problem, since this also happends with a working HAProxy. Much later, the site loaded, sometimes broken though. I'll try to prepare a config snipped if required. Is there anything know already? Hi, Thanks to information that Christian provided me offlist, I've finally found and fixed the bug. The corresponding commit is : commit a22782b597ee9a3bfecb18a66e29633c8e814216 Author: Christopher Faulet Date: Mon Feb 8 17:18:01 2021 +0100 BUG/MEDIUM: mux-h1: Always set CS_FL_EOI for response in MSG_DONE state During the message parsing, if in MSG_DONE state, the CS_FL_EOI flag must always be set on the conn-stream if following conditions are met : * It is a response or * It is a request but not a protocol upgrade nor a CONNECT. For now, there is no test on the message type (request or response). Thus the CS_FL_EOI flag is not set for a response with a "Connection: upgrade" header but not a 101 response. This bug was introduced by the commit 3e1748bbf ("BUG/MINOR: mux-h1: Don't set CS_FL_EOI too early for protocol upgrade requests"). It was backported as far as 2.0. Thus, this patch must also be backported as far as 2.0. However, it is not backported yet. Thanks Christian ! Thanks for the very fast patching, Christopher! I've rolled out the new version on some more production machines and I haven't noticed or heard of any issues yet. Tomorrow I'll roll it out to the rest of our LBs. -- Regards, Christian Ruppert
Re: Issues with d13afbcce5e664f9cfe797eee8c527e5fa947f1b (haproxy-2.2) "mux-h1: Don't set CS_FL_EOI too early for protocol upgrade requests"
On 2021-02-08 14:46, Christopher Faulet wrote: Le 08/02/2021 à 14:31, Christian Ruppert a écrit : Hi list, Christopher, we're having issues with the mentioned commit / patch: d13afbcce5e664f9cfe797eee8c527e5fa947f1b https://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=d13afbcce5e664f9cfe797eee8c527e5fa947f1b I can also reproduce it with 2.2.9 as well as 2.3.5. I don't have any useful details yet, just the our Jira fails to load. A curl against the site seams to work fine while browser requests (chrome / firefox) seem to timeout or at least some. See the attached log. The first 3 requests seem to be fine so far. Then, much later, there's a 504 between more 200s. I'm not sure yet why the other 200s there seem to wait / are logged after the actual timeout happens. According to chrome's F12 there are more requests still pending. Ignore the 503 there. That seems to be an unrelated problem, since this also happends with a working HAProxy. Much later, the site loaded, sometimes broken though. I'll try to prepare a config snipped if required. Is there anything know already? Thanks Christian, I'll take a look. Could you confirm or inform it happens only with requests with a "Connection: upgrade" header ? This frontend doesn't have H2 enabled explicit. I'm not really sure but it looks like some of those delayed requests don't have the upgrade header. -- Regards, Christian Ruppert
Issues with d13afbcce5e664f9cfe797eee8c527e5fa947f1b (haproxy-2.2) "mux-h1: Don't set CS_FL_EOI too early for protocol upgrade requests"
Hi list, Christopher, we're having issues with the mentioned commit / patch: d13afbcce5e664f9cfe797eee8c527e5fa947f1b https://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=d13afbcce5e664f9cfe797eee8c527e5fa947f1b I can also reproduce it with 2.2.9 as well as 2.3.5. I don't have any useful details yet, just the our Jira fails to load. A curl against the site seams to work fine while browser requests (chrome / firefox) seem to timeout or at least some. See the attached log. The first 3 requests seem to be fine so far. Then, much later, there's a 504 between more 200s. I'm not sure yet why the other 200s there seem to wait / are logged after the actual timeout happens. According to chrome's F12 there are more requests still pending. Ignore the 503 there. That seems to be an unrelated problem, since this also happends with a working HAProxy. Much later, the site loaded, sometimes broken though. I'll try to prepare a config snipped if required. Is there anything know already? -- Regards, Christian Ruppert1.2.3.4:48262 [08/Feb/2021:14:07:46.764] genfrontend_23510-somecorp_jira_prod~ genbackend_23540-somecorp_jira_prod/localhost 0/0/0/42/42 200 11980 - - 2/1/0/0/0 0/0 {jira.somecorp.com|Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36|||} "GET /secure/Dashboard.jspa HTTP/1.1" 1.2.3.4:48274 [08/Feb/2021:14:07:47.012] genfrontend_23510-somecorp_jira_prod~ genbackend_23540-somecorp_jira_prod/localhost 0/0/0/8/8 200 732 - - 11/6/4/4/0 0/0 {jira.somecorp.com|Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36|https://jira.somecorp.com/secure/Dashboard.jspa||} "GET /s/d41d8cd98f00b204e9800998ecf8427e-CDN/-98lwuj/813002/490f9a4ca6ac70d1532e1a0dd1cb197c/3.0.4/_/download/batch/com.atlassian.jira.jira-tzdetect-plugin:tzdetect-banner-component/com.atlassian.jira.jira-tzdetect-plugin:tzdetect-banner-component.css HTTP/1.1" 1.2.3.4:48278 [08/Feb/2021:14:07:47.012] genfrontend_23510-somecorp_jira_prod~ genbackend_23540-somecorp_jira_prod/localhost 0/0/1/7/8 200 2594 - - 11/6/3/3/0 0/0 {jira.somecorp.com|Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36|https://jira.somecorp.com/secure/Dashboard.jspa||} "GET /s/ff9c3ef8b3ac69e6c33e26ebff0feeac-CDN/-98lwuj/813002/490f9a4ca6ac70d1532e1a0dd1cb197c/14305ce2982b2bea8ec24ee5b182b6c7/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css HTTP/1.1" 1.2.3.4:48278 [08/Feb/2021:14:07:47.042] genfrontend_23510-somecorp_jira_prod~ genbackend_23540-somecorp_jira_prod/localhost 0/0/0/24/24 200 3460 - - 12/6/5/5/0 0/0 {jira.somecorp.com|Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36|https://jira.somecorp.com/secure/Dashboard.jspa||} "GET /s/00712794fa9ae1af7f4bae6d811706f6-CDN/-98lwuj/813002/490f9a4ca6ac70d1532e1a0dd1cb197c/3.0.4/_/download/batch/com.atlassian.jira.jira-tzdetect-plugin:tzdetect-banner-component/com.atlassian.jira.jira-tzdetect-plugin:tzdetect-banner-component.js?locale=de-DE HTTP/1.1" 1.2.3.4:48274 [08/Feb/2021:14:07:47.036] genfrontend_23510-somecorp_jira_prod~ genbackend_23540-somecorp_jira_prod/localhost 0/0/0/-1/30 504 203 - - sH-- 12/6/4/4/0 0/0 {jira.somecorp.com|Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36|https://jira.somecorp.com/secure/Dashboard.jspa||} "GET /s/9a5257d0fa632d5edfa0967de8b1c7df-CDN/-98lwuj/813002/490f9a4ca6ac70d1532e1a0dd1cb197c/6aeee818fc5e706562782156532c027f/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=de-DE HTTP/1.1" 1.2.3.4:48276 [08/Feb/2021:14:07:47.012] genfrontend_23510-somecorp_jira_prod~ genbackend_23540-somecorp_jira_prod/localhost 0/0/1/13/300035 200 144662 - - sD-- 11/6/3/3/0 0/0 {jira.somecorp.com|Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36|https://jira.somecorp.com/secure/Dashboard.jspa||} "GET /s/fc667f1dddeeda81ff75f751da43391f-CDN/-98lwuj/813002/490f9a4ca6ac70d1532e1a0dd1cb197c/c2b5a025bbb84cbbb3ec1b499bc08403/_/download/contextbatch/css/atl.dashboard,jira.global,atl.general,jira.dashboard,-_super/batch.css?agile_global_admin_condition=true&jag=true HTTP/1.1" 1.2.3.4:48276 [08/Feb/2021:14:12:47.046] genfrontend_23510-somecorp_jira_prod~ genbackend_23540-somecorp_jira_prod/localhost 0/0/0/7/7 200 861 - - 11/6/3/3/0 0/0 {jira.somecorp.com|Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36|https://jira.somecorp.com/secure/Dashboard.jspa||} "GET /s/d41d8cd98f00b204e9800998ecf8427e-CDN/-98lwuj/813002/490f9a4ca6ac70d1532e1a0dd1cb197c/3.0.4/_/download/batch/com.atlassian.jira.jira-tzdetect-plugin:tzdetect-lib/com.atlassian.jira.jira-tzdetect-plugin:tzdetect-lib.js HTTP/1.1" 1.2.3.4:48274 [08/Feb/2021:14:12:47.058
Re: [PR] hpack-tbl-t.h uses VAR_ARRAY and requires compiler.h to be included
No Problem at all. Feel free :) On 2020-12-21 12:59, Willy Tarreau wrote: On Mon, Dec 21, 2020 at 12:20:36PM +0100, Christian Ruppert wrote: > 2) we include from this file so that it becomes > consistent > with everything else ; > > 3) we add the ifdef VAR_ARRAY directly into the file so that it > continues > not to depend on anything and can be directly imported into other > projects as needed. > > I guess I prefer the 3rd option here as it's extremely cheap and will > keep external build setups very straightforward. What do you think ? > > Thanks! > Willy 2. and 3. sounds good. 3. however seems to be the best solution, indeed. OK, do you mind if I just modify your patch and commit message according to this ? Or do you prefer to send a new one ? I'm asking because while I usually have no problem modifying patches or commit messages, I don't do it when they're signed. Thanks, Willy -- Regards, Christian Ruppert
Re: [PR] hpack-tbl-t.h uses VAR_ARRAY and requires compiler.h to be included
Hey Willy, On 2020-12-21 11:36, Willy Tarreau wrote: Hi, On Sun, Dec 20, 2020 at 12:58:52PM +0500, ??? wrote: ping :) Oh I completely missed this one in the noise it seems! I'm sorry. No problem! :) > Author: Christian Ruppert > Number of patches: 1 > > This is an automated relay of the Github pull request: >hpack-tbl-t.h uses VAR_ARRAY and requires compiler.h to be included I initially tried hard not to put haproxy-specific dependencies in these protocol-specific parts so that they could easily be reused by other projects if needed (hence the relaxed MIT license). But I guess adding compiler.h is not that big of a deal. However I disagree with including it from the same directory with double-quotes, as we try to keep our includes more or less ordered with certain dependencies. Thus Christian, I can offer 3 possibilities here, I don't know which one best suits your use case: 1) we include from this file. It will best follow the current practices all over the code, but may or may not work for your use case depending how you include the file; 2) we include from this file so that it becomes consistent with everything else ; 3) we add the ifdef VAR_ARRAY directly into the file so that it continues not to depend on anything and can be directly imported into other projects as needed. I guess I prefer the 3rd option here as it's extremely cheap and will keep external build setups very straightforward. What do you think ? Thanks! Willy 2. and 3. sounds good. 3. however seems to be the best solution, indeed. -- Regards, Christian Ruppert
Storing src + backend or frontend name in stick-table
Hi List, is it possible to store both, IP (src) and the frontend and/or backend name in a stick table? We use the IP in some frontends, the frontend/backend name is only for visibility/informational purpose. We have pretty huge configs with several hundred frontends/backends and we'd like to know like where a bot triggered some action and stuff like that. -- Regards, Christian Ruppert
Re: HTTP/2 in 2.1.x behaves different than in 2.0.x
Hi Jerome, Willy, thanks! Yeah, it only affected url, not path. I've fixed all cases were we wrongly assumed that url is like path. Thanks for clarifying! On 2020-07-03 19:59, Willy Tarreau wrote: On Fri, Jul 03, 2020 at 02:25:33PM +0200, Jerome Magnin wrote: Hi Christian, On Fri, Jul 03, 2020 at 11:02:48AM +0200, Christian Ruppert wrote: > Hi List, > > we've just noticed and confirmed some strange change in behavior, depending > on whether the request is made with HTTP 1.x or 2.x. > [...] > That also affects ACLs like url*/path* and probably others. > I don't think that is intended, isn't it? > That looks like a regression to me. If that is a bug/regression, than it > might be good if it's possible to catch that one via test case (regtest). > This change is intentional and not a regression, it was introduced by this commit: http://git.haproxy.org/?p=haproxy.git;a=commit;h=30ee1efe676e8264af16bab833c621d60a72a4d7 Yep, it's the only way not to break end-to-end transmission, which is even harder when H1 is used first and H2 behind. Also please note that "path" is *not* broken because it's already taken from the right place. "url" will see changes when comparing with the previous version which would see a path in H2, or either a path or a uri in H1. Because if you're using "url", in H1 you can already have the two forms. Now what haproxy does is to preserve each URL component intact. If you change the scheme it only changes it. If you call "set-path" it will only change the path, if you use "replace-uri" it will replace the whole uri. I'd say that HTTP/2 with the :authority header was made very browser-centric and went back to the origins of the URIs. It's certain that for all of us working more on the server side it looks unusual but for those on the client side it's more natural. Regardless, what it does was already supported by HTTP/1 agents and even used to communicate with proxies, so it's not a fundamental breakage, it just emphasizes something that people were not often thinking about. Hoping this helps, Willy -- Regards, Christian Ruppert
HTTP/2 in 2.1.x behaves different than in 2.0.x
Hi List, we've just noticed and confirmed some strange change in behavior, depending on whether the request is made with HTTP 1.x or 2.x. Steps to reproduce: HAProxy 2.1.x A simple http frontend, including h2 + logging tail -f /var/log/haproxy.log|grep curl curl -s https://example.com -o /dev/null --http1.1 curl -s https://example.com -o /dev/null --http2 Notice the difference: test_https~ backend_test/testsrv1 1/0/0/2/3 200 4075 - - 1/1/0/0/0 0/0 {example.com|curl/7.69.1|} "GET / HTTP/1.1" test_https~ backend_test/testsrv1 0/0/0/3/3 200 4075 - - 1/1/0/0/0 0/0 {example.com|curl/7.69.1|} "GET https://example.com/ HTTP/2.0" Now the same with HAProxy 2.0.14: test_https~ backend_test/testsrv1 1/0/0/2/3 200 4075 - - 1/1/0/0/0 0/0 {example.com|curl/7.69.1|} "GET / HTTP/1.1" test_https~ backend_test/testsrv1 0/0/0/3/3 200 4075 - - 1/1/0/0/0 0/0 {example.com|curl/7.69.1|} "GET / HTTP/2.0" That also affects ACLs like url*/path* and probably others. I don't think that is intended, isn't it? That looks like a regression to me. If that is a bug/regression, than it might be good if it's possible to catch that one via test case (regtest). -- Regards, Christian Ruppert
Re: Weird issues with UNIX-Sockets on 2.1.x
On 2020-03-27 16:58, Christian Ruppert wrote: On 2020-03-27 16:49, Olivier Houchard wrote: On Fri, Mar 27, 2020 at 04:32:21PM +0100, Christian Ruppert wrote: On 2020-03-27 16:27, Olivier Houchard wrote: > On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote: >> During the reload I just found something in the daemon log: >> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : >> Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540] >> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : >> Starting proxy someotherlistener: cannot bind socket [:::18540] >> >> So during the reload, this happened and seems to have caused any >> further >> issues/trouble. >> > > That would make sense. Does that mean you have old processes hanging > around ? Do you use seemless reload ? If so, it shouldn't attempt to > bind the socket, but get them from the old process. I remember that it was necessary to have a systemd wrapper around, as it caused trouble otherwise, due to PID being changed etc. Not sure if that wrapper is still in use. In this case it's systemd though. [Unit] Description=HAProxy Load Balancer After=network.target [Service] Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q ExecReload=/bin/kill -USR2 $MAINPID KillMode=mixed Restart=always SuccessExitStatus=143 TimeoutStopSec=30 Type=notify [...] We've added the TimeoutStopSec=30 for some reason (I'd have to ask my college, something took longer or something like that, since we have quite a lot of frontends/listener/backend) Only the two processes I mentioned before are / were running. Seems like the fallback didn't work properly? The wrapper is no longer needed, it has been superceeded by the master-worker (which you seem to be using, given you're using -Ws). It is possible the old process refuse to die, and you end up hitting the timeout and it gets killed eventually, but it's too late. Do you have a expose-fd listeners on the unix stats socket ? Using it will allow the new process to connect to the old process' stats socket, and get all the listening sockets, so that it won't have to bind them. Oh, that sounds quite handy. I wasn't aware of it. I'll add it soonish. Thanks for the hint! https://www.haproxy.com/de/blog/hitless-reloads-with-haproxy-howto/ "Please note that this step does not need to be performed if your HAProxy configuration already contains the directive “master-worker”, or if it is started with the option -W." I have steps to reproduce it: A C sample to bind the socket (nc doesn't work for some reason): #include #include #include #include int main() { int sock; struct sockaddr_in server; sock = socket(AF_INET , SOCK_STREAM , 0); if (sock == -1) { printf("Failed to create socket!\n"); } server.sin_family = AF_INET; server.sin_addr.s_addr = INADDR_ANY; server.sin_port = htons(1338); if( bind(sock,(struct sockaddr *) &server , sizeof(server)) == -1) { printf("Failed to bind socket!\n"); } while(1) { sleep(1); } return 0; } gcc socket.c -o socket ./socket Having a initial HAProxy config: global user haproxy group haproxy log-send-hostname log 127.0.0.1 len 65535 local0 stats socket unix@/run/haproxy.stat user haproxy gid haproxy mode 600 level admin frontend unixsocket_reload bind 127.0.0.1:1337 bind unix@/run/haproxy-sockettest.sock user haproxy group root mode 600 mode http log global And starting it, with sytemd, ending up in: /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid Testing: curl --unix-socket /run/haproxy-sockettest.sock http://127.0.0.1 -vs echo help | socat unix-connect:/run/haproxy.stat stdio Adding a second frontend to the haproxy.cfg: frontend unixsocket_reload2 bind 127.0.0.1:1338 bind unix@/run/haproxy-sockettest-2.sock user haproxy group root mode 600 mode http log global systemctl reload haproxy curl and socat doesn't work anymore while the TCP socket still works. Now restarting HAProxy with the initial config but with the adjusted stats socket: stats socket unix@/run/haproxy.stat user haproxy gid haproxy mode 600 level admin expose-fd listeners Note that the -x will be appended automatically (at least for systemd -Ws) And doing the same again. curl and socat still works. The new frontend does not even though the UNIX socket it created. I think the way that works is ok for me then. Thanks for pointing out the expose-fd listeners! Regards, Olivier -- Regards, Christian Ruppert
Re: Weird issues with UNIX-Sockets on 2.1.x
On 2020-03-27 16:49, Olivier Houchard wrote: On Fri, Mar 27, 2020 at 04:32:21PM +0100, Christian Ruppert wrote: On 2020-03-27 16:27, Olivier Houchard wrote: > On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote: >> During the reload I just found something in the daemon log: >> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : >> Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540] >> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : >> Starting proxy someotherlistener: cannot bind socket [:::18540] >> >> So during the reload, this happened and seems to have caused any >> further >> issues/trouble. >> > > That would make sense. Does that mean you have old processes hanging > around ? Do you use seemless reload ? If so, it shouldn't attempt to > bind the socket, but get them from the old process. I remember that it was necessary to have a systemd wrapper around, as it caused trouble otherwise, due to PID being changed etc. Not sure if that wrapper is still in use. In this case it's systemd though. [Unit] Description=HAProxy Load Balancer After=network.target [Service] Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q ExecReload=/bin/kill -USR2 $MAINPID KillMode=mixed Restart=always SuccessExitStatus=143 TimeoutStopSec=30 Type=notify [...] We've added the TimeoutStopSec=30 for some reason (I'd have to ask my college, something took longer or something like that, since we have quite a lot of frontends/listener/backend) Only the two processes I mentioned before are / were running. Seems like the fallback didn't work properly? The wrapper is no longer needed, it has been superceeded by the master-worker (which you seem to be using, given you're using -Ws). It is possible the old process refuse to die, and you end up hitting the timeout and it gets killed eventually, but it's too late. Do you have a expose-fd listeners on the unix stats socket ? Using it will allow the new process to connect to the old process' stats socket, and get all the listening sockets, so that it won't have to bind them. Oh, that sounds quite handy. I wasn't aware of it. I'll add it soonish. Thanks for the hint! Regards, Olivier -- Regards, Christian Ruppert
Re: Weird issues with UNIX-Sockets on 2.1.x
On 2020-03-27 16:27, Olivier Houchard wrote: On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote: During the reload I just found something in the daemon log: Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540] Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : Starting proxy someotherlistener: cannot bind socket [:::18540] So during the reload, this happened and seems to have caused any further issues/trouble. That would make sense. Does that mean you have old processes hanging around ? Do you use seemless reload ? If so, it shouldn't attempt to bind the socket, but get them from the old process. I remember that it was necessary to have a systemd wrapper around, as it caused trouble otherwise, due to PID being changed etc. Not sure if that wrapper is still in use. In this case it's systemd though. [Unit] Description=HAProxy Load Balancer After=network.target [Service] Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q ExecReload=/bin/kill -USR2 $MAINPID KillMode=mixed Restart=always SuccessExitStatus=143 TimeoutStopSec=30 Type=notify # The following lines leverage SystemD's sandboxing options to provide # defense in depth protection at the expense of restricting some flexibility # in your setup (e.g. placement of your configuration files) or possibly # reduced performance. See systemd.service(5) and systemd.exec(5) for further # information. # NoNewPrivileges=true # ProtectHome=true # If you want to use 'ProtectSystem=strict' you should whitelist the PIDFILE, # any state files and any other files written using 'ReadWritePaths' or # 'RuntimeDirectory'. # ProtectSystem=true # ProtectKernelTunables=true # ProtectKernelModules=true # ProtectControlGroups=true # If your SystemD version supports them, you can add: @reboot, @swap, @sync # SystemCallFilter=~@cpu-emulation @keyring @module @obsolete @raw-io [Install] WantedBy=multi-user.target We've added the TimeoutStopSec=30 for some reason (I'd have to ask my college, something took longer or something like that, since we have quite a lot of frontends/listener/backend) Only the two processes I mentioned before are / were running. Seems like the fallback didn't work properly? Regards, Olivier -- Regards, Christian Ruppert
Re: Weird issues with UNIX-Sockets on 2.1.x
During the reload I just found something in the daemon log: Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540] Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : Starting proxy someotherlistener: cannot bind socket [:::18540] So during the reload, this happened and seems to have caused any further issues/trouble. On 2020-03-27 15:10, Christian Ruppert wrote: So now I looked for more of those "SC"'s in the log, from our monitoring and it appeared first around 13:38:01. Around 13:37:54 a reload was issued by puppet or rundeck. So right now, it seems that something happened during the reload which affected UNIX sockets. On 2020-03-27 15:00, Christian Ruppert wrote: Hi Olivier, On 2020-03-27 14:50, Olivier Houchard wrote: Hi Christian, On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote: Hi list, we have some weird issues now, the second time, that *some* SSL sockets seem to be broken as well as stats sockets. HTTP seems to work fine, still, SSL ones are broken however. It happened at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure whether the first time was on 2.1.2 or 2.1.3. The one that failed today was updated yesterday, so HAProxy has an uptime of about 24h. We're using threads. default + HTTP is using 1 thread, 1 is dedicated for a TCP listener/Layer-4, one is for RSA only and all the rest is for ECC. [...] The problem ocurred arount 13:40 (CET, in case it matters at some point) Any ideas so far? So basically, it used to work, and suddenly you get errors on any TLS connection ? Yeah, right now it looks like that way. If you still have the TCP stat socket working, can you show the output of "show fd" ? Oh, it's the http stats listener that's still working. Not sure whether it accepts any commands to be honest. pid = 21313 (process #1, nbproc = 1, nbthread = 8) uptime = 0d 1h56m48s system limits: memmax = unlimited; ulimit-n = 1574819 maxsock = 1574819; maxconn = 786432; maxpipes = 0 current conns = 6; current pipes = 0/0; conn rate = 43/sec; bit rate = 219.704 kbps Running tasks: 1/1158; idle = 100 % Thanks ! Olivier -- Regards, Christian Ruppert
Re: Weird issues with UNIX-Sockets on 2.1.x
So now I looked for more of those "SC"'s in the log, from our monitoring and it appeared first around 13:38:01. Around 13:37:54 a reload was issued by puppet or rundeck. So right now, it seems that something happened during the reload which affected UNIX sockets. On 2020-03-27 15:00, Christian Ruppert wrote: Hi Olivier, On 2020-03-27 14:50, Olivier Houchard wrote: Hi Christian, On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote: Hi list, we have some weird issues now, the second time, that *some* SSL sockets seem to be broken as well as stats sockets. HTTP seems to work fine, still, SSL ones are broken however. It happened at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure whether the first time was on 2.1.2 or 2.1.3. The one that failed today was updated yesterday, so HAProxy has an uptime of about 24h. We're using threads. default + HTTP is using 1 thread, 1 is dedicated for a TCP listener/Layer-4, one is for RSA only and all the rest is for ECC. [...] The problem ocurred arount 13:40 (CET, in case it matters at some point) Any ideas so far? So basically, it used to work, and suddenly you get errors on any TLS connection ? Yeah, right now it looks like that way. If you still have the TCP stat socket working, can you show the output of "show fd" ? Oh, it's the http stats listener that's still working. Not sure whether it accepts any commands to be honest. pid = 21313 (process #1, nbproc = 1, nbthread = 8) uptime = 0d 1h56m48s system limits: memmax = unlimited; ulimit-n = 1574819 maxsock = 1574819; maxconn = 786432; maxpipes = 0 current conns = 6; current pipes = 0/0; conn rate = 43/sec; bit rate = 219.704 kbps Running tasks: 1/1158; idle = 100 % Thanks ! Olivier -- Regards, Christian Ruppert
Re: Weird issues with UNIX-Sockets on 2.1.x
Hi Olivier, On 2020-03-27 14:50, Olivier Houchard wrote: Hi Christian, On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote: Hi list, we have some weird issues now, the second time, that *some* SSL sockets seem to be broken as well as stats sockets. HTTP seems to work fine, still, SSL ones are broken however. It happened at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure whether the first time was on 2.1.2 or 2.1.3. The one that failed today was updated yesterday, so HAProxy has an uptime of about 24h. We're using threads. default + HTTP is using 1 thread, 1 is dedicated for a TCP listener/Layer-4, one is for RSA only and all the rest is for ECC. [...] The problem ocurred arount 13:40 (CET, in case it matters at some point) Any ideas so far? So basically, it used to work, and suddenly you get errors on any TLS connection ? Yeah, right now it looks like that way. If you still have the TCP stat socket working, can you show the output of "show fd" ? Oh, it's the http stats listener that's still working. Not sure whether it accepts any commands to be honest. pid = 21313 (process #1, nbproc = 1, nbthread = 8) uptime = 0d 1h56m48s system limits: memmax = unlimited; ulimit-n = 1574819 maxsock = 1574819; maxconn = 786432; maxpipes = 0 current conns = 6; current pipes = 0/0; conn rate = 43/sec; bit rate = 219.704 kbps Running tasks: 1/1158; idle = 100 % Thanks ! Olivier -- Regards, Christian Ruppert
Weird issues with UNIX-Sockets on 2.1.x
-clobbered -Wno-missing-field-initializers -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference OPTIONS = USE_PCRE=1 USE_PCRE_JIT= USE_LIBCRYPT=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_NS= USE_SYSTEMD=1 Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER +PCRE -PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED -REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO -NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS Default settings : bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with multi-threading support (MAX_THREADS=64, default=8). Built with OpenSSL version : OpenSSL 1.1.0l 10 Sep 2019 Running on OpenSSL version : OpenSSL 1.1.0l 10 Sep 2019 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 Built with Lua version : Lua 5.3.3 Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Built with PCRE version : 8.39 2016-06-14 Running on PCRE version : 8.39 2016-06-14 PCRE library supports JIT : no (USE_PCRE_JIT not set) Encrypted password support via crypt(3): yes Built with zlib version : 1.2.8 Running on zlib version : 1.2.8 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with the Prometheus exporter as a service Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available multiplexer protocols : (protocols marked as cannot be specified using 'proto' keyword) h2 : mode=HTTP side=FE|BE mux=H2 fcgi : mode=HTTP side=BEmux=FCGI : mode=HTTP side=FE|BE mux=H1 : mode=TCPside=FE|BE mux=PASS Available services : prometheus-exporter Available filters : [SPOE] spoe [CACHE] cache [FCGI] fcgi-app [TRACE] trace [COMP] compression -- Regards, Christian Ruppert
Re: master-worker no-exit-on-failure with SO_REUSEPORT and a port being already in use
On 2019-11-20 11:05, William Lallemand wrote: On Wed, Nov 20, 2019 at 10:19:20AM +0100, Christian Ruppert wrote: Hi William, thanks for the patch. I'll test it later today. What I actually wanted to achieve is: https://cbonte.github.io/haproxy-dconv/2.0/management.html#4 Then HAProxy tries to bind to all listening ports. If some fatal errors happen (eg: address not present on the system, permission denied), the process quits with an error. If a socket binding fails because a port is already in use, then the process will first send a SIGTTOU signal to all the pids specified in the "-st" or "-sf" pid list. This is what is called the "pause" signal. It instructs all existing haproxy processes to temporarily stop listening to their ports so that the new process can try to bind again. During this time, the old process continues to process existing connections. If the binding still fails (because for example a port is shared with another daemon), then the new process sends a SIGTTIN signal to the old processes to instruct them to resume operations just as if nothing happened. The old processes will then restart listening to the ports and continue to accept connections. Not that this mechanism is system In my test case though it failed to do so. Well, it only works with HAProxy processes, not with other processes. There is no mechanism to ask a process which is neither an haproxy process nor a process which use SO_REUSEPORT. With HAProxy processes it will bind with SO_REUSEPORT, and will only use the SIGTTOU/SIGTTIN signals if it fails to do so. This part of the documentation is for HAProxy without master-worker mode in master-worker mode, once the master is launched successfully it is never supposed to quit upon a reload (kill -USR2). During a reload in master-worker mode, the master will do a -sf . If the reload failed for any reason (bad configuration, unable to bind etc.), the behavior is to keep the previous workers. It only tries to kill the workers if the reload succeed. So this is the default behavior. Your patch seems to fix the issue. The master process won't exit anymore. Fallback seems to work during my initial tests. Thanks! -- Regards, Christian Ruppert
Re: Combining (kind of) http and tcp checks
Hi Aleks, On 2019-11-21 11:01, Aleksandar Lazic wrote: Hi. Am 21.11.2019 um 10:49 schrieb Christian Ruppert: Hi list, for an old exchange cluster I have some check listener like: listen chk_s015023 bind 0.0.0.0:1001 mode http monitor-uri /check tcp-request connection reject if { nbsrv lt 6 } { src LOCALHOST } monitor fail if { nbsrv lt 6 } default-server inter 3s rise 2 fall 3 server s015023_smtp 192.168.15.23:25 check server s015023_pop3 192.168.15.23:110 check server s015023_imap 192.168.15.23:143 check server s015023_https 192.168.15.23:443 check server s015023_imaps 192.168.15.23:993 check server s015023_pop3s 192.168.15.23:995 check Which is then being used by the actual backends like: backend bk_exchange_https mode http option httpchk HEAD /check HTTP/1.0 server s015023 192.168.15.23:443 ssl verify none check addr 127.0.0.1 port 1001 observe layer4 server s015024 192.168.15.24:443 ssl verify none check addr 127.0.0.1 port 1002 observe layer4 ... The old cluster is currently being updated and there's a included health check available for Exchange which I'd like to include. So I was thinking about something like: listen chk_s015023_healthcheck bind 0.0.0.0:1003 mode http monitor-uri /check_exchange tcp-request connection reject if { nbsrv lt 1 } { src LOCALHOST } monitor fail if { nbsrv lt 1 } default-server inter 3s rise 2 fall 3 option httpchk GET /owa/healthcheck.htm HTTP/1.0 server s015023_health 192.168.15.23:443 check ssl verify none listen chk_s015023 bind 0.0.0.0:1001 mode http monitor-uri /check tcp-request connection reject if { nbsrv lt 7 } { src LOCALHOST } monitor fail if { nbsrv lt 7 } default-server inter 3s rise 2 fall 3 server s015023_smtp 192.168.15.23:25 check server s015023_pop3 192.168.15.23:110 check server s015023_imap 192.168.15.23:143 check server s015023_https 192.168.15.23:443 check server s015023_imaps 192.168.15.23:993 check server s015023_pop3s 192.168.15.23:995 check server chk_s015023_healthcheck 127.0.0.1:1003 check The new healthcheck is marked as being down/up as expected, the problem is, that the TCP check for that new health check "server chk_s015023_healthcheck 127.0.0.1:1003 check" doesn't work. Even though we have that "tcp-request connection reject if { nbsrv lt 1 } { src LOCALHOST }" within the new check, it doesn't seem to be enough for the TCP check. Is it somehow possible to combine both checks, to make it recognize the new check's status properly? I'd like to avoid using an external check script to do all those checks. Maybe you can use the track feature from haproxy for that topic. https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#5.2-track I have never used it but it looks exactly what you want. 1 backend for tcp checks and 1 backend for http right? Regards Aleks Thanks! That seems to do the trick: listen chk_s015023_healthcheck bind 0.0.0.0:1003 mode http monitor-uri /check_exchange tcp-request connection reject if { nbsrv lt 1 } { src LOCALHOST } monitor fail if { nbsrv lt 1 } default-server inter 3s rise 2 fall 3 option httpchk GET /owa/healthcheck.htm HTTP/1.0 server s015023_health 192.168.15.23:443 check ssl verify none listen chk_s015023 bind 0.0.0.0:1001 mode http monitor-uri /check tcp-request connection reject if { nbsrv lt 6 } { src LOCALHOST } monitor fail if { nbsrv lt 6 } default-server inter 3s rise 2 fall 3 server s015023_smtp 192.168.15.23:25 check server s015023_pop3 192.168.15.23:110 check server s015023_imap 192.168.15.23:143 check server s015023_https 192.168.15.23:443 track chk_s015023_healthcheck/s015023_health server s015023_imaps 192.168.15.23:993 check server s015023_pop3s 192.168.15.23:995 check -- Regards, Christian Ruppert
Combining (kind of) http and tcp checks
Hi list, for an old exchange cluster I have some check listener like: listen chk_s015023 bind 0.0.0.0:1001 mode http monitor-uri /check tcp-request connection reject if { nbsrv lt 6 } { src LOCALHOST } monitor fail if { nbsrv lt 6 } default-server inter 3s rise 2 fall 3 server s015023_smtp 192.168.15.23:25 check server s015023_pop3 192.168.15.23:110 check server s015023_imap 192.168.15.23:143 check server s015023_https 192.168.15.23:443 check server s015023_imaps 192.168.15.23:993 check server s015023_pop3s 192.168.15.23:995 check Which is then being used by the actual backends like: backend bk_exchange_https mode http option httpchk HEAD /check HTTP/1.0 server s015023 192.168.15.23:443 ssl verify none check addr 127.0.0.1 port 1001 observe layer4 server s015024 192.168.15.24:443 ssl verify none check addr 127.0.0.1 port 1002 observe layer4 ... The old cluster is currently being updated and there's a included health check available for Exchange which I'd like to include. So I was thinking about something like: listen chk_s015023_healthcheck bind 0.0.0.0:1003 mode http monitor-uri /check_exchange tcp-request connection reject if { nbsrv lt 1 } { src LOCALHOST } monitor fail if { nbsrv lt 1 } default-server inter 3s rise 2 fall 3 option httpchk GET /owa/healthcheck.htm HTTP/1.0 server s015023_health 192.168.15.23:443 check ssl verify none listen chk_s015023 bind 0.0.0.0:1001 mode http monitor-uri /check tcp-request connection reject if { nbsrv lt 7 } { src LOCALHOST } monitor fail if { nbsrv lt 7 } default-server inter 3s rise 2 fall 3 server s015023_smtp 192.168.15.23:25 check server s015023_pop3 192.168.15.23:110 check server s015023_imap 192.168.15.23:143 check server s015023_https 192.168.15.23:443 check server s015023_imaps 192.168.15.23:993 check server s015023_pop3s 192.168.15.23:995 check server chk_s015023_healthcheck 127.0.0.1:1003 check The new healthcheck is marked as being down/up as expected, the problem is, that the TCP check for that new health check "server chk_s015023_healthcheck 127.0.0.1:1003 check" doesn't work. Even though we have that "tcp-request connection reject if { nbsrv lt 1 } { src LOCALHOST }" within the new check, it doesn't seem to be enough for the TCP check. Is it somehow possible to combine both checks, to make it recognize the new check's status properly? I'd like to avoid using an external check script to do all those checks. -- Regards, Christian Ruppert
Re: master-worker no-exit-on-failure with SO_REUSEPORT and a port being already in use
Hi William, thanks for the patch. I'll test it later today. What I actually wanted to achieve is: https://cbonte.github.io/haproxy-dconv/2.0/management.html#4 Then HAProxy tries to bind to all listening ports. If some fatal errors happen (eg: address not present on the system, permission denied), the process quits with an error. If a socket binding fails because a port is already in use, then the process will first send a SIGTTOU signal to all the pids specified in the "-st" or "-sf" pid list. This is what is called the "pause" signal. It instructs all existing haproxy processes to temporarily stop listening to their ports so that the new process can try to bind again. During this time, the old process continues to process existing connections. If the binding still fails (because for example a port is shared with another daemon), then the new process sends a SIGTTIN signal to the old processes to instruct them to resume operations just as if nothing happened. The old processes will then restart listening to the ports and continue to accept connections. Not that this mechanism is system In my test case though it failed to do so. On 2019-11-19 17:27, William Lallemand wrote: On Tue, Nov 19, 2019 at 04:19:26PM +0100, William Lallemand wrote: > I then add another bind for port 80, which is in use by squid already > and try to reload HAProxy. It takes some time until it failes: > > Nov 19 14:39:21 894a0f616fec haproxy[2978]: [WARNING] 322/143921 (2978) > : Reexecuting Master process > ... > Nov 19 14:39:28 894a0f616fec haproxy[2978]: [ALERT] 322/143922 (2978) : > Starting frontend somefrontend: cannot bind socket [0.0.0.0:80] > ... > Nov 19 14:39:28 894a0f616fec systemd[1]: haproxy.service: Main process > exited, code=exited, status=1/FAILURE > > The reload itself is still running (systemd) and will timeout after > about 90s. After that, because of the Restart=always, I guess, it ends > up in a restart loop. > > So I would have expected that the master process will fallback to the > old process and proceed with the old child until the problem has been > fixed. > The patch in attachment fixes a bug where haproxy could reexecute itself in waitpid mode with -sf -1. I'm not sure this is your bug, but if this is the case you should see haproxy in waitpid mode, then the master exiting with the usage message in your logs. -- Regards, Christian Ruppert
master-worker no-exit-on-failure with SO_REUSEPORT and a port being already in use
Hi list, I'm facing some issues with already in use ports and the fallback feature, during a reload. SO_REUSEPORT already makes ist easier/better but not perfect, as there are still cases were it fails. In my test case I've got a Squid running on port 80 and a HAProxy with "master-worker no-exit-on-failure". I am using the shipped (2.0.8) systemd unit file and startup HAProxy with some frontend and a bind on like 1337 or something. I then add another bind for port 80, which is in use by squid already and try to reload HAProxy. It takes some time until it failes: Nov 19 14:39:21 894a0f616fec haproxy[2978]: [WARNING] 322/143921 (2978) : Reexecuting Master process ... Nov 19 14:39:28 894a0f616fec haproxy[2978]: [ALERT] 322/143922 (2978) : Starting frontend somefrontend: cannot bind socket [0.0.0.0:80] ... Nov 19 14:39:28 894a0f616fec systemd[1]: haproxy.service: Main process exited, code=exited, status=1/FAILURE The reload itself is still running (systemd) and will timeout after about 90s. After that, because of the Restart=always, I guess, it ends up in a restart loop. So I would have expected that the master process will fallback to the old process and proceed with the old child until the problem has been fixed. Can anybody confirm that? Is that intended? https://cbonte.github.io/haproxy-dconv/2.0/management.html#4 https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#3.1-master-worker -- Regards, Christian Ruppert
Re: H/2 via Unix Sockets fails
Hi Jarno, On 2019-06-04 12:44, Jarno Huuskonen wrote: Hi Christian, On Thu, Apr 25, Christian Ruppert wrote: listen genlisten_10320-cust1.tls-tcp acl REQ_TLS_HAS_ECC req.ssl_ec_ext eq 1 tcp-request content accept if { req_ssl_hello_type 1 } # Match Client SSL Hello use-server socket-10320-rsa if !REQ_TLS_HAS_ECC server socket-10320-rsa unix@/run/haproxy-10320-rsa.sock send-proxy-v2 use-server socket-10320-ecc if REQ_TLS_HAS_ECC server socket-10320-ecc unix@/run/haproxy-10320-ecc.sock send-proxy-v2 Do you need this tcp frontend for just serving both rsa/ecc certificates ? If so I think haproxy can do this(with openssl >= 1.0.2) with crt keyword: https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#5.1-crt -Jarno listen genlisten_10320-cust1.tls bind unix@/run/haproxy-10320-rsa.sock accept-proxy user haproxy group root mode 600 ssl crt /etc/haproxy/test-rsa.pem alpn h2,http/1.1 process 3 bind unix@/run/haproxy-10320-ecc.sock accept-proxy user haproxy group root mode 600 ssl crt /etc/haproxy/test-ecc.pem alpn h2,http/1.1 process 4-8 Yeah, I think we'll still need that construct. What we want to achieve with this kind of setup is: One process/core for pure connections (that TCP stuff), one for HTTP, *one* for RSA and all the rest for ECC. RSA costs so much that it's really easy to (D)DoS that process which would otherwise affect all other processes as well. So we just want to have all that separated, http from https and RSA from ECC. -- Regards, Christian Ruppert
Re: H/2 via Unix Sockets fails
Hi Jarno, thanks, your propsal seems to work. Here's a working test config based on one of our production configs: curl -kvs -o /dev/null https://127.0.0.1:10320 --http1.1 Apr 25 15:32:51 localhost haproxy[2847]: 127.0.0.1:36880 [25/Apr/2019:15:32:51.554] genfrontend_10310-cust1 genfrontend_10310-cust1/ -1/-1/-1/-1/0 503 212 - - SC-- 1/1/0/0/0 0/0 "GET / HTTP/1.1" Apr 25 15:32:51 localhost haproxy[2846]: 127.0.0.1:36880 [25/Apr/2019:15:32:51.553] genlisten_10320-cust1.tls~ genlisten_10320-cust1.tls/socket-10310 1/0/1 212 -- 1/1/0/0/0 0/0 Apr 25 15:32:51 localhost haproxy[2841]: 127.0.0.1:36880 [25/Apr/2019:15:32:51.549] genlisten_10320-cust1.tls-tcp genlisten_10320-cust1.tls-tcp/socket-10320-ecc 4/0/5 995 -- 1/1/0/0/0 0/0 curl -kvs -o /dev/null https://127.0.0.1:10320 --http2 Apr 25 15:32:59 localhost haproxy[2847]: 127.0.0.1:36882 [25/Apr/2019:15:32:59.246] genfrontend_10310-cust1 genfrontend_10310-cust1/ -1/-1/-1/-1/0 503 212 - - SC-- 1/1/0/0/0 0/0 "GET / HTTP/1.1" Apr 25 15:32:59 localhost haproxy[2845]: 127.0.0.1:36882 [25/Apr/2019:15:32:59.243] genlisten_10320-cust1.tls~ genlisten_10320-cust1.tls/socket-10310-h2 3/0/3 184 -- 1/1/0/0/0 0/0 Apr 25 15:32:59 localhost haproxy[2841]: 127.0.0.1:36882 [25/Apr/2019:15:32:59.228] genlisten_10320-cust1.tls-tcp genlisten_10320-cust1.tls-tcp/socket-10320-ecc 16/0/19 990 CD 1/1/0/0/0 0/0 global nbproc 8 # ... listen genlisten_10320-cust1.tls-tcp mode tcp bind-process 2 bind :10320 log global option tcplog # ... tcp-request inspect-delay 7s acl REQ_TLS_HAS_ECC req.ssl_ec_ext eq 1 tcp-request content accept if { req_ssl_hello_type 1 } # Match Client SSL Hello use-server socket-10320-rsa if !REQ_TLS_HAS_ECC server socket-10320-rsa unix@/run/haproxy-10320-rsa.sock send-proxy-v2 use-server socket-10320-ecc if REQ_TLS_HAS_ECC server socket-10320-ecc unix@/run/haproxy-10320-ecc.sock send-proxy-v2 listen genlisten_10320-cust1.tls mode tcp log global option tcplog bind-process 3-8 bind unix@/run/haproxy-10320-rsa.sock accept-proxy user haproxy group root mode 600 ssl crt /etc/haproxy/test-rsa.pem alpn h2,http/1.1 process 3 bind unix@/run/haproxy-10320-ecc.sock accept-proxy user haproxy group root mode 600 ssl crt /etc/haproxy/test-ecc.pem alpn h2,http/1.1 process 4-8 use-server socket-10310-h2 if { ssl_fc_alpn h2 } server socket-10310-h2 unix@/run/haproxy-10310-h2.sock send-proxy-v2 use-server socket-10310 if !{ ssl_fc_alpn h2 } server socket-10310 unix@/run/haproxy-10310.sock send-proxy-v2 frontend genfrontend_10310-cust1 bind :10310 bind unix@/run/haproxy-10310-h2.sock id 210312 accept-proxy user haproxy group root mode 600 proto h2 # TLS uplink H2 bind unix@/run/haproxy-10310.sock id 210310 accept-proxy user haproxy group root mode 600 # TLS uplink mode http option httplog log global # ... So it would be cool if both were possible, H2 as well as H1 via that socket, using "alpn h2,http/1.1" -- Regards, Christian Ruppert
Re: H/2 via Unix Sockets fails
Hi Willy, that doesn't seem to work either, only HTTP/1.1 We have several hundret listener/frontends/backends and we're using the old nbproc > 1 process model. We have the initial TCP listener that's bound to one core. It checks wether it's ECC capable or not and then it goes to the second listener that does the actual SSL termination with RSA/ECC on multiple cores and from there it goes to the actual frontend, which is on a different core. We plan to test and migrate to the threading model if it performs as good as the current one or even better. But actually that was meant for much later that year or even 2020 :( I'm not sure if that would solve the actual problem, since may still need sockets for RSA/ECC I guess. The inital plan was to just make it also support HTTP2 by adding "alpn h2,http/1.1" to the unix bind in the "h2test_tcp.tls" On 2019-04-24 15:06, Willy Tarreau wrote: Hi Christian, On Wed, Apr 24, 2019 at 02:29:40PM +0200, Christian Ruppert wrote: Hi, so I did some more tests and it seems to be an issue between h2test_tcp.tls and the frontend, using the UNIX sockets. Adding a TCP bind to that listener also doesn't work. Am I doing it wrong or is it a bug somewhere with H/2 and UNIX sockets? I also disabled the PROXY protocol - doesn't help. I currently have no idea about this one. There should be no reason for H2 to depend on the underlying socket type. Hmm wait a minute. It might not be related to the UNIX sockets at all. In fact what's happening is that your first proxy is not advertising H2 in the ALPN connection, so the second one doesn't receive it and negociates H1. You could try to add "alpn h2" at the end of your server line below : listen h2test_tcp mode tcp bind :444 option tcplog log global server socket-444-h2test unix@/run/haproxy-444-h2test.sock send-proxy-v2 ^ listen h2test_tcp.tls mode tcp option tcplog log global bind unix@/run/haproxy-444-h2test.sock accept-proxy user haproxy group haproxy mode 600 ssl crt /etc/haproxy/ssl/h2test.pem alpn h2,http/1.1 server socket-444_2 unix@/run/haproxy-444_2-h2test.sock send-proxy-v2 ^ And on this one as well. However it will break your H1. What are you trying to do exactly ? Maybe there is a simpler solution. Willy -- Regards, Christian Ruppert
Re: H/2 via Unix Sockets fails
Hi, so I did some more tests and it seems to be an issue between h2test_tcp.tls and the frontend, using the UNIX sockets. Adding a TCP bind to that listener also doesn't work. Am I doing it wrong or is it a bug somewhere with H/2 and UNIX sockets? I also disabled the PROXY protocol - doesn't help. On 2019-04-23 15:57, Christian Ruppert wrote: Hey, we have an older setup using nbproc >1 and having a listener for the initial tcp connection and one for the actual SSL/TLS, also using tcp mode which then goes to the actual frontend using http mode. Each being bound to different processes. So here's the test config I've used: listen h2test_tcp mode tcp bind :444 option tcplog log global server socket-444-h2test unix@/run/haproxy-444-h2test.sock send-proxy-v2 listen h2test_tcp.tls mode tcp option tcplog log global bind unix@/run/haproxy-444-h2test.sock accept-proxy user haproxy group haproxy mode 600 ssl crt /etc/haproxy/ssl/h2test.pem alpn h2,http/1.1 server socket-444_2 unix@/run/haproxy-444_2-h2test.sock send-proxy-v2 frontend some_frontend mode http log global bind unix@/run/haproxy-444_2-h2test.sock id 444 accept-proxy user haproxy group haproxy mode 600 bind :80 ... So what I'm doing is: curl -k4vs https://127.0.0.1:444/~idl0r/ --http1.1 curl -k4vs https://127.0.0.1:444/~idl0r/ --http2 So with HTTP/1.1 I get: public_http backend_qasl_de/qasl1 0/0/0/0/0 200 510 - - 3/1/0/0/0 0/0 {127.0.0.1:444|curl/7.64.1|} "GET / HTTP/1.1" h2test_tcp.tls~ h2test_tcp.tls/socket-444_2 5/1/6 605 -- 2/1/0/0/0 0/0 h2test_tcp h2test_tcp/socket-444-h2test 1/0/6 3335 CD 1/1/0/0/0 0/0 With H/2: public_http public_http/ -1/-1/-1/-1/0 400 187 - - PR-- 3/1/0/0/0 0/0 {||} "" h2test_tcp.tls~ h2test_tcp.tls/socket-444_2 6/0/5 187 SD 2/1/0/0/0 0/0 h2test_tcp h2test_tcp/socket-444-h2test 1/0/5 2911 SD 1/1/0/0/0 0/0 curl says: # curl -k4vs https://127.0.0.1:444/ --http2 * Trying 127.0.0.1... * TCP_NODELAY set * Connected to 127.0.0.1 (127.0.0.1) port 444 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH * successfully set certificate verify locations: * CAfile: /etc/ssl/certs/ca-certificates.crt CApath: /etc/ssl/certs * TLSv1.2 (OUT), TLS header, Certificate Status (22): * TLSv1.2 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.2 (OUT), TLS handshake, Finished (20): * TLSv1.2 (IN), TLS change cipher, Change cipher spec (1): * TLSv1.2 (IN), TLS handshake, Finished (20): * SSL connection using TLSv1.2 / ECDHE-ECDSA-AES256-GCM-SHA384 * ALPN, server accepted to use h2 * Server certificate: * subject: CN=... * start date: Mar 1 18:00:17 2019 GMT * expire date: May 30 18:00:17 2019 GMT * issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3 * SSL certificate verify ok. * Using HTTP2, server supports multi-use * Connection state changed (HTTP/2 confirmed) * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 * Using Stream ID: 1 (easy handle 0x56087e29b770) GET / HTTP/2 Host: 127.0.0.1:444 User-Agent: curl/7.64.1 Accept: */* * http2 error: Remote peer returned unexpected data while we expected SETTINGS frame. Perhaps, peer does not support HTTP/2 properly. * Connection #0 to host 127.0.0.1 left intact * Closing connection 0 Can anybody else confirm that? Tested with HAProxy 1.9.6. Any ideas what might be the reason? Right now, I'd guess that's a Problem with H/2 and those sockets on the HAProxy side. -- Regards, Christian Ruppert
H/2 via Unix Sockets fails
Hey, we have an older setup using nbproc >1 and having a listener for the initial tcp connection and one for the actual SSL/TLS, also using tcp mode which then goes to the actual frontend using http mode. Each being bound to different processes. So here's the test config I've used: listen h2test_tcp mode tcp bind :444 option tcplog log global server socket-444-h2test unix@/run/haproxy-444-h2test.sock send-proxy-v2 listen h2test_tcp.tls mode tcp option tcplog log global bind unix@/run/haproxy-444-h2test.sock accept-proxy user haproxy group haproxy mode 600 ssl crt /etc/haproxy/ssl/h2test.pem alpn h2,http/1.1 server socket-444_2 unix@/run/haproxy-444_2-h2test.sock send-proxy-v2 frontend some_frontend mode http log global bind unix@/run/haproxy-444_2-h2test.sock id 444 accept-proxy user haproxy group haproxy mode 600 bind :80 ... So what I'm doing is: curl -k4vs https://127.0.0.1:444/~idl0r/ --http1.1 curl -k4vs https://127.0.0.1:444/~idl0r/ --http2 So with HTTP/1.1 I get: public_http backend_qasl_de/qasl1 0/0/0/0/0 200 510 - - 3/1/0/0/0 0/0 {127.0.0.1:444|curl/7.64.1|} "GET / HTTP/1.1" h2test_tcp.tls~ h2test_tcp.tls/socket-444_2 5/1/6 605 -- 2/1/0/0/0 0/0 h2test_tcp h2test_tcp/socket-444-h2test 1/0/6 3335 CD 1/1/0/0/0 0/0 With H/2: public_http public_http/ -1/-1/-1/-1/0 400 187 - - PR-- 3/1/0/0/0 0/0 {||} "" h2test_tcp.tls~ h2test_tcp.tls/socket-444_2 6/0/5 187 SD 2/1/0/0/0 0/0 h2test_tcp h2test_tcp/socket-444-h2test 1/0/5 2911 SD 1/1/0/0/0 0/0 curl says: # curl -k4vs https://127.0.0.1:444/ --http2 * Trying 127.0.0.1... * TCP_NODELAY set * Connected to 127.0.0.1 (127.0.0.1) port 444 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH * successfully set certificate verify locations: * CAfile: /etc/ssl/certs/ca-certificates.crt CApath: /etc/ssl/certs * TLSv1.2 (OUT), TLS header, Certificate Status (22): * TLSv1.2 (OUT), TLS handshake, Client hello (1): * TLSv1.2 (IN), TLS handshake, Server hello (2): * TLSv1.2 (IN), TLS handshake, Certificate (11): * TLSv1.2 (IN), TLS handshake, Server key exchange (12): * TLSv1.2 (IN), TLS handshake, Server finished (14): * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.2 (OUT), TLS handshake, Finished (20): * TLSv1.2 (IN), TLS change cipher, Change cipher spec (1): * TLSv1.2 (IN), TLS handshake, Finished (20): * SSL connection using TLSv1.2 / ECDHE-ECDSA-AES256-GCM-SHA384 * ALPN, server accepted to use h2 * Server certificate: * subject: CN=... * start date: Mar 1 18:00:17 2019 GMT * expire date: May 30 18:00:17 2019 GMT * issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3 * SSL certificate verify ok. * Using HTTP2, server supports multi-use * Connection state changed (HTTP/2 confirmed) * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 * Using Stream ID: 1 (easy handle 0x56087e29b770) GET / HTTP/2 Host: 127.0.0.1:444 User-Agent: curl/7.64.1 Accept: */* * http2 error: Remote peer returned unexpected data while we expected SETTINGS frame. Perhaps, peer does not support HTTP/2 properly. * Connection #0 to host 127.0.0.1 left intact * Closing connection 0 Can anybody else confirm that? Tested with HAProxy 1.9.6. Any ideas what might be the reason? Right now, I'd guess that's a Problem with H/2 and those sockets on the HAProxy side. -- Regards, Christian Ruppert
Re: `stats bind-process` broken
Hey Guys, I can confirm those issues as well as the proposed fix/workaround to solve the issue. I upgraded our "nbproc" setup from 1.7.x to 1.9.6 today and noticed some missing entries from the stats socket, e.g.: # echo 'show stat' | socat stdio /run/haproxy.stat|wc -l 1442 Which is correct, while after the upgrade it was indeed showing stats from a random proc: # echo 'show stat' | socat stdio /run/haproxy.stat|wc -l 341 etc. Adding the "process 1" to the "stats socket" line seems to help. On 2019-04-11 18:24, Willy Tarreau wrote: Hi Patrick, On Thu, Apr 11, 2019 at 12:18:14PM -0400, Patrick Hemmer wrote: With haproxy 1.9.6 the `stats bind-process` directive is not working. Every connection to the socket is going to a random process: Here's a simple reproduction: Config: global nbproc 3 stats socket /tmp/haproxy.sock level admin stats bind-process 1 Testing: # for i in {1..5}; do socat - unix:/tmp/haproxy.sock <<< "show info" | grep Pid: ; done Pid: 33371 Pid: 33373 Pid: 33372 Pid: 33373 Pid: 33373 This must be pretty annoying. I don't have memories of anything changed regarding the bind-process stuff between 1.8 and 1.9 (the threads have moved a lot however). It could be a side effect of some of these changes though. However I'm seeing that adding "process 1" on the "stats socket" line itself fixes the problem. I suspect the issue is located in the propagation of the frontend's mask to the listener, I'll look at this. Thanks! Willy -Patrick -- Regards, Christian Ruppert
Re: haproxy and solarflare onload
Oh, btw, I'm just reading that onload documentation. Filters Filters are used to deliver packets received from the wire to the appropriate application. When filters are exhausted it is not possible to create new accelerated sockets. The general recommendation is that applications do not allocate more than 4096 filters ‐ or applications should not create more than 4096 outgoing connections. The limit does not apply to inbound connections to a listening socket. On 2017-12-20 13:11, Christian Ruppert wrote: Hi Elias, I'm currently preparing a test setup including a SFN8522 + onload. How did you measure it? When did those errors (drops/discard?) appear, during a test or some real traffic? The first thing I did is updating the driver + firmware. Is both up2date in your case? I haven't measured / compared the SFN8522 against a X520 nor X710 yet but do you have RSS / affinity or something related, enabled/set? Intel has some features and Solarflare may have its own stuff. On 2017-12-20 11:48, Elias Abacioglu wrote: Hi, Yes on the LD_PRELOAD. Yes, I have one node running with Solarflare SFN8522 2p 10Gbit/s currently without Onload enabled. it has 17.5K http_request_rate and ~26% server interrupts on core 0 and 1 where the NIC IRQ is bound to. And I have a similar node with Intel X710 2p 10Gbit/s. It has 26.1K http_request_rate and ~26% server interrupts on core 0 and 1 where the NIC IRQ is bound to. both nodes have 1 socket, Intel Xeon CPU E3-1280 v6, 32 GB RAM. So without Onload Solarflare performs worse than the X710 since it has the same amount of SI load with less traffic. And a side note is that I haven't compared the ethtool settings between Intel and Solarflare, just running with the defaults of both cards. I currently have a support ticket open with the Solarflare team to about the issues I mentioned in my previous mail, if they sort that out I can perhaps setup a test server if I can manage to free up one server. Then we can do some synthetic benchmarks with a set of parameters of your choosing. Regards, /Elias On Wed, Dec 20, 2017 at 9:48 AM, Willy Tarreau wrote: Hi Elias, On Tue, Dec 19, 2017 at 02:23:21PM +0100, Elias Abacioglu wrote: Hi, I recently bought a solarflare NIC with (ScaleOut) Onload / OpenOnload to test it with HAproxy. Have anyone tried running haproxy with solarflare onload functions? After I started haproxy with onload, this started spamming on the kernel log: Dec 12 14:11:54 dflb06 kernel: [357643.035355] [onload] oof_socket_add_full_hw: 6:3083 ERROR: FILTER TCP 10.3.54.43:4147 [1] 10.3.20.116:80 [2] failed (-16) Dec 12 14:11:54 dflb06 kernel: [357643.064395] [onload] oof_socket_add_full_hw: 6:3491 ERROR: FILTER TCP 10.3.54.43:39321 [3] 10.3.20.113:80 [4] failed (-16) Dec 12 14:11:54 dflb06 kernel: [357643.081069] [onload] oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403 [5] 10.3.20.30:445 [6] failed (-16) Dec 12 14:11:54 dflb06 kernel: [357643.082625] [onload] oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403 [5] 10.3.20.30:445 [6] failed (-16) And this in haproxy log: Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy ssl-relay reached system memory limit at 9931 sockets. Please check system tunables. Dec 12 14:12:07 dflb06 haproxy[21146]: Proxy ssl-relay reached system memory limit at 9184 sockets. Please check system tunables. Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory limit at 9931 sockets. Please check system tunables. Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory limit at 9931 sockets. Please check system tunables. Apparently I've hit the max hardware filter limit on the card. Does anyone here have experience in running haproxy with onload features? I've never got any report of any such test, though in the past I thought it would be nice to run such a test, at least to validate the perimeter covered by the library (you're using it as LD_PRELOAD, that's it ?). Mind sharing insights and advice on how to get a functional setup? I really don't know what can reasonably be expected from code trying to partially bypass a part of the TCP stack to be honnest. From what I've read a long time ago, onload might be doing its work in a not very intrusive way but judging by your messages above I'm having some doubts now. Have you tried without this software, using the card normally ? I mean, 2 years ago I had the opportunity to test haproxy on a dual-40G setup and we reached 60 Gbps of forwarded traffic with all machines in the test bench reaching their limits (and haproxy reaching 100% as well), so for me that proves that the TCP stack still scales extremely well and that while such acceleration software might make sense for a next generation NIC running on old hardware (eg: when 400 Gbps NICs start to appear), I'm really not convinced that it makes any sense to use them on well supported
Re: haproxy and solarflare onload
Hi Elias, I'm currently preparing a test setup including a SFN8522 + onload. How did you measure it? When did those errors (drops/discard?) appear, during a test or some real traffic? The first thing I did is updating the driver + firmware. Is both up2date in your case? I haven't measured / compared the SFN8522 against a X520 nor X710 yet but do you have RSS / affinity or something related, enabled/set? Intel has some features and Solarflare may have its own stuff. On 2017-12-20 11:48, Elias Abacioglu wrote: Hi, Yes on the LD_PRELOAD. Yes, I have one node running with Solarflare SFN8522 2p 10Gbit/s currently without Onload enabled. it has 17.5K http_request_rate and ~26% server interrupts on core 0 and 1 where the NIC IRQ is bound to. And I have a similar node with Intel X710 2p 10Gbit/s. It has 26.1K http_request_rate and ~26% server interrupts on core 0 and 1 where the NIC IRQ is bound to. both nodes have 1 socket, Intel Xeon CPU E3-1280 v6, 32 GB RAM. So without Onload Solarflare performs worse than the X710 since it has the same amount of SI load with less traffic. And a side note is that I haven't compared the ethtool settings between Intel and Solarflare, just running with the defaults of both cards. I currently have a support ticket open with the Solarflare team to about the issues I mentioned in my previous mail, if they sort that out I can perhaps setup a test server if I can manage to free up one server. Then we can do some synthetic benchmarks with a set of parameters of your choosing. Regards, /Elias On Wed, Dec 20, 2017 at 9:48 AM, Willy Tarreau wrote: Hi Elias, On Tue, Dec 19, 2017 at 02:23:21PM +0100, Elias Abacioglu wrote: Hi, I recently bought a solarflare NIC with (ScaleOut) Onload / OpenOnload to test it with HAproxy. Have anyone tried running haproxy with solarflare onload functions? After I started haproxy with onload, this started spamming on the kernel log: Dec 12 14:11:54 dflb06 kernel: [357643.035355] [onload] oof_socket_add_full_hw: 6:3083 ERROR: FILTER TCP 10.3.54.43:4147 [1] 10.3.20.116:80 [2] failed (-16) Dec 12 14:11:54 dflb06 kernel: [357643.064395] [onload] oof_socket_add_full_hw: 6:3491 ERROR: FILTER TCP 10.3.54.43:39321 [3] 10.3.20.113:80 [4] failed (-16) Dec 12 14:11:54 dflb06 kernel: [357643.081069] [onload] oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403 [5] 10.3.20.30:445 [6] failed (-16) Dec 12 14:11:54 dflb06 kernel: [357643.082625] [onload] oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403 [5] 10.3.20.30:445 [6] failed (-16) And this in haproxy log: Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy ssl-relay reached system memory limit at 9931 sockets. Please check system tunables. Dec 12 14:12:07 dflb06 haproxy[21146]: Proxy ssl-relay reached system memory limit at 9184 sockets. Please check system tunables. Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory limit at 9931 sockets. Please check system tunables. Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system memory limit at 9931 sockets. Please check system tunables. Apparently I've hit the max hardware filter limit on the card. Does anyone here have experience in running haproxy with onload features? I've never got any report of any such test, though in the past I thought it would be nice to run such a test, at least to validate the perimeter covered by the library (you're using it as LD_PRELOAD, that's it ?). Mind sharing insights and advice on how to get a functional setup? I really don't know what can reasonably be expected from code trying to partially bypass a part of the TCP stack to be honnest. From what I've read a long time ago, onload might be doing its work in a not very intrusive way but judging by your messages above I'm having some doubts now. Have you tried without this software, using the card normally ? I mean, 2 years ago I had the opportunity to test haproxy on a dual-40G setup and we reached 60 Gbps of forwarded traffic with all machines in the test bench reaching their limits (and haproxy reaching 100% as well), so for me that proves that the TCP stack still scales extremely well and that while such acceleration software might make sense for a next generation NIC running on old hardware (eg: when 400 Gbps NICs start to appear), I'm really not convinced that it makes any sense to use them on well supported setups like 2-4 10Gbps links which are very common nowadays. I mean, I managed to run haproxy at 10Gbps 10 years ago on a core2-duo! Hardware has evolved quite a bit since :-) Regards, Willy Links: -- [1] http://10.3.54.43:4147 [2] http://10.3.20.116:80 [3] http://10.3.54.43:39321 [4] http://10.3.20.113:80 [5] http://10.3.54.43:62403 [6] http://10.3.20.30:445 -- Regards, Christian Ruppert
[PATCH] Fix linking / LDFLAGS order of some contrib modules in 1.8
Hi Willy, please see the attached patch that fixes linking / LDFLAGS order of some contrib modules which is related to e.g. linking with -Wl,--as-needed. -- Regards, Christian RuppertFrom c702537864f7e062d18f4ccce3e29d14d4ccf05f Mon Sep 17 00:00:00 2001 From: Christian Ruppert Date: Thu, 30 Nov 2017 10:11:36 +0100 Subject: [PATCH] Fix LDFLAGS vs. LIBS re linking order Signed-off-by: Christian Ruppert --- contrib/mod_defender/Makefile | 5 ++--- contrib/modsecurity/Makefile | 5 ++--- contrib/spoa_example/Makefile | 5 ++--- 3 files changed, 6 insertions(+), 9 deletions(-) diff --git a/contrib/mod_defender/Makefile b/contrib/mod_defender/Makefile index ac17774d..efc7d7f6 100644 --- a/contrib/mod_defender/Makefile +++ b/contrib/mod_defender/Makefile @@ -28,9 +28,8 @@ EVENT_INC := /usr/include endif CFLAGS += -g -Wall -pthread -LDFLAGS += -lpthread $(EVENT_LIB) -levent_pthreads -lapr-1 -laprutil-1 -lstdc++ -lm INCS += -I../../include -I../../ebtree -I$(MOD_DEFENDER_SRC) -I$(APACHE2_INC) -I$(APR_INC) -I$(EVENT_INC) -LIBS = +LIBS += -lpthread $(EVENT_LIB) -levent_pthreads -lapr-1 -laprutil-1 -lstdc++ -lm CXXFLAGS = -g -std=gnu++11 CXXINCS += -I$(MOD_DEFENDER_SRC) -I$(MOD_DEFENDER_SRC)/deps -I$(APACHE2_INC) -I$(APR_INC) @@ -43,7 +42,7 @@ CXXSRCS = $(wildcard $(MOD_DEFENDER_SRC)/*.cpp) CXXOBJS = $(patsubst %.cpp, %.o, $(CXXSRCS)) defender: $(OBJS) $(CXXOBJS) - $(LD) -o $@ $^ $(LDFLAGS) $(LIBS) + $(LD) $(LDFLAGS) -o $@ $^ $(LIBS) install: defender install defender $(DESTDIR)$(BINDIR) diff --git a/contrib/modsecurity/Makefile b/contrib/modsecurity/Makefile index bb918c30..aa0d6e38 100644 --- a/contrib/modsecurity/Makefile +++ b/contrib/modsecurity/Makefile @@ -34,14 +34,13 @@ EVENT_INC := /usr/include endif CFLAGS += -g -Wall -pthread -LDFLAGS += -lpthread $(EVENT_LIB) -levent_pthreads -lcurl -lapr-1 -laprutil-1 -lxml2 -lpcre -lyajl INCS += -I../../include -I../../ebtree -I$(MODSEC_INC) -I$(APACHE2_INC) -I$(APR_INC) -I$(LIBXML_INC) -I$(EVENT_INC) -LIBS = +LIBS += -lpthread $(EVENT_LIB) -levent_pthreads -lcurl -lapr-1 -laprutil-1 -lxml2 -lpcre -lyajl OBJS = spoa.o modsec_wrapper.o modsecurity: $(OBJS) - $(LD) $(LDFLAGS) $(LIBS) -o $@ $^ $(MODSEC_LIB)/standalone.a + $(LD) $(LDFLAGS) -o $@ $^ $(MODSEC_LIB)/standalone.a $(LIBS) install: modsecurity install modsecurity $(DESTDIR)$(BINDIR) diff --git a/contrib/spoa_example/Makefile b/contrib/spoa_example/Makefile index d04a01e1..c44c2b87 100644 --- a/contrib/spoa_example/Makefile +++ b/contrib/spoa_example/Makefile @@ -6,15 +6,14 @@ CC = gcc LD = $(CC) CFLAGS = -g -O2 -Wall -Werror -pthread -LDFLAGS = -lpthread -levent -levent_pthreads INCS += -I../../ebtree -I./include -LIBS = +LIBS = -lpthread -levent -levent_pthreads OBJS = spoa.o spoa: $(OBJS) - $(LD) $(LDFLAGS) $(LIBS) -o $@ $^ + $(LD) $(LDFLAGS) -o $@ $^ $(LIBS) install: spoa install spoa $(DESTDIR)$(BINDIR) -- 2.13.6
Removing the cpu-map cpu/-set limit?
Hi, there is currently a limit for both, the process number itself as well as the cpu/-set. Wouldn't it make sense to lift the cpu/-set limit? Example: I've got a 18 Core / 36 Thread (72 with HT) CPU Now I want to use non-HT threads (e.g. all even threads) for which I may have to go beyond the limit of 64 or just the ones >64 which is currently not possible. Is there any reason to limit the cpu/-set value instead of just the number of processes? -- Regards, Christian Ruppert
Re: SSL/ECC and nbproc >1
On 2016-11-25 15:26, Willy Tarreau wrote: On Fri, Nov 25, 2016 at 02:44:35PM +0100, Christian Ruppert wrote: I have a default bind for process 1 which is basically the http frontend and the actual backend, RSA is bound to another, single process and ECC is bound to all the rest. So in this case SSL (in particular ECC) is the problem. The connections/handshakes should be *actually* using CPU+2 till NCPU. That's exactly what I'm talking about, look, you have this : frontend ECC bind-process 3-36 bind :65420 ssl crt /etc/haproxy/test.pem-ECC mode http default_backend bk_ram It creates a single socket (hence a single queue) and shares it between all processes. Thus each incoming connection will wake up all processes not doing anything, and the first one capable of grabbing it will take it as well as a few following ones if any. You end up with a very unbalanced load making it hard to scale. Instead you can do this : frontend ECC bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 3 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 4 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 5 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 6 ... bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 36 mode http default_backend bk_ram You'll really have 34 listening sockets all fairly balanced with their own queue. You can generally achieve higher loads this way and with a lower average latency. Also, I tend to bind network IRQs to the same cores as those doing SSL because you hardly have the two at once. SSL is not able to deal with traffic capable of saturating a NIC driver, so when SSL saturates the CPU you have little traffic and when the NIC requires all the CPU for high traffic, you know there's little SSL. Cheers, Willy Ah! Thanks! I had to remove the default "bind-process 1" or also setting the "bind-process 3-36" in the ECC frontend though. I guess it's the same at the end. Anyway the IRQ/NIC problem was still the same. I'll setup it that way anyway if that's better, together with the Intel affinity script or as you said, bound to the related core that does SSL. Let's see how well that performs. -- Regards, Christian Ruppert
Re: SSL/ECC and nbproc >1
On 2016-11-25 14:44, Christian Ruppert wrote: Hi Willy, On 2016-11-25 14:30, Willy Tarreau wrote: Hi Christian, On Fri, Nov 25, 2016 at 12:12:06PM +0100, Christian Ruppert wrote: I'll compare HT/no-HT afterwards. In my first tests it didn't same to make much of a difference o far. I also tried (in this case) to disable HT entirely and set it to max. 36 procs. Basically the same as before. Also you definitely need to split your bind lines, one per process, to take advantage of the kernel's ability to load balance between multiple queues. Otherwise the load is always unequal and many processes are woken up for nothing. I have a default bind for process 1 which is basically the http frontend and the actual backend, RSA is bound to another, single process and ECC is bound to all the rest. So in this case SSL (in particular ECC) is the problem. The connections/handshakes should be *actually* using CPU+2 till NCPU. The only shared part should be the backend but that should be actually no problem for e.g. 5 parallel benchmarks as a single HTTP benchmark can make >20k requests/s. global nbproc 36 defaults: bind-process 1 frontend http bind :65410 mode http default_backend bk_ram frontend ECC bind-process 3-36 bind :65420 ssl crt /etc/haproxy/test.pem-ECC mode http default_backend bk_ram backend bk_ram mode http fullconn 75000 errorfile 503 /etc/haproxy/test.error Regards, Willy It seems to be the NIC or rather driver/kernel. Using Intel's set_irq_affinity (set_irq_affinity -x local eth2 eth3) seems to do the trick, at least at the first glance. -- Regards, Christian Ruppert
Re: SSL/ECC and nbproc >1
Hi Willy, On 2016-11-25 14:30, Willy Tarreau wrote: Hi Christian, On Fri, Nov 25, 2016 at 12:12:06PM +0100, Christian Ruppert wrote: I'll compare HT/no-HT afterwards. In my first tests it didn't same to make much of a difference o far. I also tried (in this case) to disable HT entirely and set it to max. 36 procs. Basically the same as before. Also you definitely need to split your bind lines, one per process, to take advantage of the kernel's ability to load balance between multiple queues. Otherwise the load is always unequal and many processes are woken up for nothing. I have a default bind for process 1 which is basically the http frontend and the actual backend, RSA is bound to another, single process and ECC is bound to all the rest. So in this case SSL (in particular ECC) is the problem. The connections/handshakes should be *actually* using CPU+2 till NCPU. The only shared part should be the backend but that should be actually no problem for e.g. 5 parallel benchmarks as a single HTTP benchmark can make >20k requests/s. global nbproc 36 defaults: bind-process 1 frontend http bind :65410 mode http default_backend bk_ram frontend ECC bind-process 3-36 bind :65420 ssl crt /etc/haproxy/test.pem-ECC mode http default_backend bk_ram backend bk_ram mode http fullconn 75000 errorfile 503 /etc/haproxy/test.error Regards, Willy -- Regards, Christian Ruppert
Re: SSL/ECC and nbproc >1
Hi Conrad, On 2016-10-21 17:39, Conrad Hoffmann wrote: Hi, it's a lot of information, and I don't have time to go into all details right now, but from a quick read, here are the things I noticed: - Why nbproc 64? Your CPU has 18 cores (36 w/ HT), so more procs than that will likely make performance rather worse. HT cores share the cache, so using 18 might make most sense (see also below). It's best to experiment a little with that and measure the results, though. I'll compare HT/no-HT afterwards. In my first tests it didn't same to make much of a difference o far. I also tried (in this case) to disable HT entirely and set it to max. 36 procs. Basically the same as before. - If you see ksoftirq eating up a lot of of one CPU, then your box is most likely configured to process all IRQs on the first core. Most NICs these days can be configured to use several IRQs, which you can then distribute across all cores, smoothening the workload across cores significantly. I'll try to get a more recent Distro (It's a Debian Wheezy still) with a newer driver etc. They seem to have added some IRQ options in more recent versions of ixgbe. Kernel could also be related. So disabling HT did not help. nginx seems to have similar problem btw. so it's neither HAProxy nor nginx I guess. - Consider using "bind-process" to lock the processes to a single core (but make sure to leave out the HT cores, or disable HT altogether). Less context switching, might improve performance) Hope that helps, Conrad On 10/21/2016 04:47 PM, Christian Ruppert wrote: Hi, again a performance topic. I did some further testing/benchmarks with ECC and nbproc >1. I was testing on a "E5-2697 v4" and the first thing I noticed was that HAProxy has a fixed limit of 64 for nbproc. So the setup: HAProxy server with the mentioned E5: global user haproxy group haproxy maxconn 75000 log 127.0.0.2 local0 ssl-default-bind-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDH ssl-default-bind-options no-sslv3 no-tls-tickets tune.ssl.default-dh-param 1024 nbproc 64 defaults timeout client 300s timeout server 300s timeout queue 60s timeout connect 7s timeout http-request 10s maxconn 75000 bind-process 1 # HTTP frontend haproxy_test_http bind :65410 mode http option httplog option httpclose log global default_backend bk_ram # ECC frontend haproxy_test-ECC bind-process 3-64 bind :65420 ssl crt /etc/haproxy/test.pem-ECC mode http option httplog option httpclose log global default_backend bk_ram backend bk_ram mode http fullconn 75000 # Just in case the lower default limit will be reached... errorfile 503 /etc/haproxy/test.error /etc/haproxy/test.error: HTTP/1.0 200 Cache-Control: no-cache Connection: close Content-Type: text/plain Test123456 The ECC key: openssl ecparam -genkey -name prime256v1 -out /etc/haproxy/test.pem-ECC.key openssl req -new -sha256 -key /etc/haproxy/test.pem-ECC.key -days 365 -nodes -x509 -sha256 -subj "/O=ECC Test/CN=test.example.com" -out /etc/haproxy/test.pem-ECC.crt cat /etc/haproxy/test.pem-ECC.key /etc/haproxy/test.pem-ECC.crt > /etc/haproxy/test.pem-ECC So then I tried a local "ab": ab -n 5000 -c 250 https://127.0.0.1:65420/ Server Hostname:127.0.0.1 Server Port:65420 SSL/TLS Protocol: TLSv1/SSLv3,ECDHE-ECDSA-AES128-GCM-SHA256,256,128 Document Path: / Document Length:107 bytes Concurrency Level: 250 Time taken for tests: 3.940 seconds Complete requests: 5000 Failed requests:0 Write errors: 0 Non-2xx responses: 5000 Total transferred: 106 bytes HTML transferred: 535000 bytes Requests per second:1268.95 [#/sec] (mean) Time per request: 197.013 [ms] (mean) Time per request: 0.788 [ms] (mean, across all concurrent requests) Transfer rate: 262.71 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 54 138 34.7162 193 Processing: 8 51 34.8 24 157 Waiting:3 40 31.6 18 113 Total:177 189 7.5188 333 Percentage of the requests served within a certain time (ms) 50%188 66%189 75%190 80%190 90%191 95%192 98%196 99%205 100%333 (longest request) The same test with just nbproc 1 was about ~1500 requests/s. So 1,5k * nbproc would have been what I expected, at least somewhere near that value. Then I setup 61 EC2 instances, standard setup t2-micro. They're somewhat slower with ~1k ECC requests per second but that's ok for the test. HTTP (one proc) via localhos
SSL/ECC and nbproc >1
hms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with OpenSSL version : OpenSSL 1.0.1e 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.1t 3 May 2016 (VERSIONS DIFFER!) OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 8.30 2012-02-04 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built without Lua support Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. I actually thought I was using 1.6.9 on that host already so I just upgraded and tried again some benchmarks but it looks like it's almost equal at the first glance. -- Regards, Christian Ruppert
Stale UNIX sockets after reload
Hi, it seems that HAProxy does not remove the UNIX sockets after reloading (also restarting?) even though they have been removed from the configuration and thus are stale afterwards. At least 1.6.4 seems to be affected. Can anybody else confirm that? It's a multi-process setup in this case but it also happens with binds bound to just one process. -- Regards, Christian Ruppert
Re: Sharing SSL information via PROXY protocol or HAProxy internally
Hi Dennis, On 2016-04-16 02:13, Dennis Jacobfeuerborn wrote: On 15.04.2016 16:01, Christian Ruppert wrote: Hi, would it be possible to inherit the SSL information from a SSL listener/frontend via PROXY protocol? So for example: listen ssl-relay mode tcp ... server rsa unix@/var/run/haproxy_ssl_rsa.sock send-proxy-v2 listen ssl-rsa_ecc mode tcp ... bind unix@/var/run/haproxy_ssl_rsa.sock accept-proxy ssl crt SSl-RSA.PEM user haproxy frontend http_https bind :80 # http bind unix@/var/run/haproxy_ssl.sock accept-proxy user haproxy # https redirect scheme https code 301 if !{ssl_fc} Here the ssl_fc and other SSL related ACLs do not work because the actual SSL termination has been done in the above ssl-rsa_ecc listener. Sharing that either internally or via the PROXY protocol would be really handy, if that's possible. For now we use the bind "id" to check whether it's the proxy connection or not but the above would be much easier/better IMHO. For this specific case of http to https redirect I use the X-Forwarded-Proto header. In the ssl frontend I do this: http-request set-header X-Forwarded-Proto https and in the plain http frontend I do this: http-request redirect scheme https if !{ req.hdr(X-Forwarded-Proto) https } The problem here is that one could set that in a plain http request as well and would avoid some redirects and whatnot, depending on what you do based on what decision. You may also want the other SSL data, cipher, version etc. Since 1.6 you can set variables, ok, but somehow passing that kind of information could be really useful I guess. You usually need to set this header anyway so the application knows it needs to generate https URLs in the generated HTML. Regards, Dennis -- Regards, Christian Ruppert
Sharing SSL information via PROXY protocol or HAProxy internally
Hi, would it be possible to inherit the SSL information from a SSL listener/frontend via PROXY protocol? So for example: listen ssl-relay mode tcp ... server rsa unix@/var/run/haproxy_ssl_rsa.sock send-proxy-v2 listen ssl-rsa_ecc mode tcp ... bind unix@/var/run/haproxy_ssl_rsa.sock accept-proxy ssl crt SSl-RSA.PEM user haproxy frontend http_https bind :80 # http bind unix@/var/run/haproxy_ssl.sock accept-proxy user haproxy # https redirect scheme https code 301 if !{ssl_fc} Here the ssl_fc and other SSL related ACLs do not work because the actual SSL termination has been done in the above ssl-rsa_ecc listener. Sharing that either internally or via the PROXY protocol would be really handy, if that's possible. For now we use the bind "id" to check whether it's the proxy connection or not but the above would be much easier/better IMHO. -- Regards, Christian Ruppert
Re: nbproc 1 vs >1 performance
On 2016-04-14 11:06, Christian Ruppert wrote: Hi Willy, On 2016-04-14 10:17, Willy Tarreau wrote: On Thu, Apr 14, 2016 at 08:55:47AM +0200, Lukas Tribus wrote: Le me put it this way: frontend haproxy_test bind-process 1-8 bind :12345 process 1 bind :12345 process 2 bind :12345 process 3 bind :12345 process 4 Leads to 8 processes, and the master process binds the socket 4 times (PID 16509): (...) lukas@ubuntuvm:~/haproxy-1.5$ sudo netstat -tlp | grep hap tcp0 0 *:12345 *:* LISTEN 16509/haproxy tcp0 0 *:12345 *:* LISTEN 16509/haproxy tcp0 0 *:12345 *:* LISTEN 16509/haproxy tcp0 0 *:12345 *:* LISTEN 16509/haproxy lukas@ubuntuvm:~/haproxy-1.5$ OK so it's netstat which gives a wrong report, I have the same here. I verified in /proc/$PID/fd/ and I properly saw the FDs. Next, "ss -anp" also shows all the process list : LISTEN 0 128 *:12345 *:* users:(("haproxy",25360,7),("haproxy",25359,7),("haproxy",25358,7),("haproxy",25357,7),("haproxy",25356,7),("haproxy",25355,7),("haproxy",25354,7),("haproxy",25353,7)) LISTEN 0 128 *:12345 *:* users:(("haproxy",25360,6),("haproxy",25359,6),("haproxy",25358,6),("haproxy",25357,6),("haproxy",25356,6),("haproxy",25355,6),("haproxy",25354,6),("haproxy",25353,6)) LISTEN 0 128 *:12345 *:* users:(("haproxy",25360,5),("haproxy",25359,5),("haproxy",25358,5),("haproxy",25357,5),("haproxy",25356,5),("haproxy",25355,5),("haproxy",25354,5),("haproxy",25353,5)) LISTEN 0 128 *:12345 *:* users:(("haproxy",25360,4),("haproxy",25359,4),("haproxy",25358,4),("haproxy",25357,4),("haproxy",25356,4),("haproxy",25355,4),("haproxy",25354,4),("haproxy",25353,4)) A performance test also shows a fair distribution of the load : 25353 willy 20 0 21872 4216 1668 S 26 0.1 0:04.54 haproxy 25374 willy 20 0 7456 1080 S 25 0.0 0:02.26 injectl464 25376 willy 20 0 7456 1080 S 25 0.0 0:02.27 injectl464 25377 willy 20 0 7456 1080 S 25 0.0 0:02.26 injectl464 25375 willy 20 0 7456 1080 S 24 0.0 0:02.26 injectl464 25354 willy 20 0 21872 4168 1620 R 22 0.1 0:04.51 haproxy 25356 willy 20 0 21872 4216 1668 R 22 0.1 0:04.21 haproxy 25355 willy 20 0 21872 4168 1620 S 21 0.1 0:04.38 haproxy However, as you can see these sockets are still bound to all processes and that's not a good idea in the multi-queue mode. I have added a few debug lines in enable_listener() like this : $ git diff diff --git a/src/listener.c b/src/listener.c index 5abeb80..59c51a1 100644 --- a/src/listener.c +++ b/src/listener.c @@ -49,6 +49,7 @@ static struct bind_kw_list bind_keywords = { */ void enable_listener(struct listener *listener) { + fddebug("%d: enabling fd %d\n", getpid(), listener->fd); if (listener->state == LI_LISTEN) { if ((global.mode & (MODE_DAEMON | MODE_SYSTEMD)) && listener->bind_conf->bind_proc && @@ -57,6 +58,7 @@ void enable_listener(struct listener *listener) * want any fd event to reach it. */ fd_stop_recv(listener->fd); + fddebug("%d: pausing fd %d\n", getpid(), listener->fd); listener->state = LI_PAUSED; } else if (listener->nbconn < listener->maxconn) { And we're seeing this upon startup for processes 25746..25755 : Thus as you can see that FDs are properly enabled and paused for the unavailable ones. willy@wtap:haproxy$ grep 4294967295 log | grep 25746 25746 write(4294967295, "25746: enabling fd 4\n", 21 25746 write(4294967295, "25746: enabling fd 5\n", 21 25746 write(4294967295, "25746: pausing fd 5\n", 20) = -1 EBADF (Bad file descriptor) 25746 write(4294967295, "25746: enabling fd 6\n", 21) = -1 EBADF (Bad file descriptor) 25746 write(4294967295, "25746: pausing fd 6\n", 20) = -1 EBADF (Bad file descriptor) 25746 write(4294967295, "25746: enabling fd 7\n", 21 25746 write(4294967295, "25746: pausing fd 7\n", 20 willy@wtap:haproxy$ grep 4294967295 log | grep 25747 25747 write(4294967295, "25747: enabling fd 4\n", 21 25747 write(4294967295, "25747: pausing fd 4\n", 20 25747 w
Re: nbproc 1 vs >1 performance
On 2016-04-14 11:41, Willy Tarreau wrote: Hi Christian, On Thu, Apr 14, 2016 at 11:06:02AM +0200, Christian Ruppert wrote: I've applied your patch and I just looked at the performance so far. The performance is still the same, so the lessperformant one is still less performant than the moreperformant.cfg. So from the performance point of view there's no difference between with and without that patch. We have a (IMHO) quite huge haproxy config with around 200 frontends, ~180 backends and ~90 listener. So the intention was to combine a http bind with a ssl bind in one frontend so that we keep it at a minimum that is necessary to get it working properly: listen ssl-relay mode tcp bind-process 2 bind :443 process 2 tcp-request inspect-delay 7s acl HAS_ECC req.ssl_ec_ext eq 1 tcp-request content accept if { req_ssl_hello_type 1 } # Client Hello use-server ecc if HAS_ECC server ecc unix@/var/run/haproxy_ssl_ecc.sock send-proxy-v2 use-server rsa if !HAS_ECC server rsa unix@/var/run/haproxy_ssl_rsa.sock send-proxy-v2 frontend http_https_combined mode http bind-process 1-40 bind :80 process 1 bind unix@/var/run/haproxy_ssl_ecc.sock accept-proxy ssl crt /etc/haproxy/ssltest.pem-ECC user haproxy process 4-40 bind unix@/var/run/haproxy_ssl_rsa.sock accept-proxy ssl crt /etc/haproxy/ssltest.pem-RSA user haproxy process 3 ... default_backend somebackend OK I know why. It's because there's a per-listener "maxaccept" setting which divides the global.tune.maxaccept value by 2 then by the number of processes a listener is bound to. But in fact it divides by the number of processes the frontend is bound to. In the past this used to be efficient but since the "process" directive on frontends it doesn't make sense anymore. So in practice above you end up with "maxaccept 1". You could try to work around it by setting "tune.maxaccept 4000" in your global section, you'll get the original performance back. Yep, that did it. With this setting there is no more performance decrease on the http bind. Thanks! I'm just not sure if that will (negatively) affect anything else. I have to change this obviously, so that we consider the intersection between the listener and the frontend and not the frontend only. I'm looking at this right now. Willy -- Regards, Christian Ruppert
Re: nbproc 1 vs >1 performance
", 21 25747 write(4294967295, "25747: enabling fd 6\n", 21 25747 write(4294967295, "25747: pausing fd 6\n", 20 25747 write(4294967295, "25747: enabling fd 7\n", 21 25747 write(4294967295, "25747: pausing fd 7\n", 20 willy@wtap:haproxy$ grep 4294967295 log | grep 25748 25748 write(4294967295, "25748: enabling fd 4\n", 21 25748 write(4294967295, "25748: pausing fd 4\n", 20 25748 write(4294967295, "25748: enabling fd 5\n", 21 25748 write(4294967295, "25748: pausing fd 5\n", 20 25748 write(4294967295, "25748: enabling fd 6\n", 21 25748 write(4294967295, "25748: enabling fd 7\n", 21 25748 write(4294967295, "25748: pausing fd 7\n", 20 willy@wtap:haproxy$ grep 4294967295 log | grep 25749 25749 write(4294967295, "25749: enabling fd 4\n", 21 25749 write(4294967295, "25749: pausing fd 4\n", 20 25749 write(4294967295, "25749: enabling fd 5\n", 21 25749 write(4294967295, "25749: pausing fd 5\n", 20 25749 write(4294967295, "25749: enabling fd 6\n", 21 25749 write(4294967295, "25749: pausing fd 6\n", 20 25749 write(4294967295, "25749: enabling fd 7\n", 21 willy@wtap:haproxy$ grep 4294967295 log | grep 25750 25750 write(4294967295, "25750: enabling fd 4\n", 21 25750 write(4294967295, "25750: pausing fd 4\n", 20 25750 write(4294967295, "25750: enabling fd 5\n", 21 25750 write(4294967295, "25750: pausing fd 5\n", 20 25750 write(4294967295, "25750: enabling fd 6\n", 21 25750 write(4294967295, "25750: pausing fd 6\n", 20 25750 write(4294967295, "25750: enabling fd 7\n", 21 25750 write(4294967295, "25750: pausing fd 7\n", 20 Now with the following patch to completely unbind such listeners : diff --git a/src/listener.c b/src/listener.c index 5abeb80..0296d50 100644 --- a/src/listener.c +++ b/src/listener.c @@ -56,8 +57,7 @@ void enable_listener(struct listener *listener) /* we don't want to enable this listener and don't * want any fd event to reach it. */ - fd_stop_recv(listener->fd); - listener->state = LI_PAUSED; + unbind_listener(listener); } else if (listener->nbconn < listener->maxconn) { fd_want_recv(listener->fd); I get this which is much cleaner : LISTEN 0 128 *:12345 *:* users:(("haproxy",25949,7)) LISTEN 0 128 *:12345 *:* users:(("haproxy",25948,6)) LISTEN 0 128 *:12345 *:* users:(("haproxy",25947,5)) LISTEN 0 128 *:12345 *:* users:(("haproxy",25946,4)) So I guess that indeed, if not all the processes a frontend is bound to have a corresponding bind line, this can cause connection issues as some incoming connections will be distributed to queues that nobody listens to. I'm willing to commit this patch to make things cleaner and more reliable. Here I'm getting the exact same performance with and without. Christian you may want to apply it by hand to test if it improves the behaviour for you. First of all thanks for the quick response! I've applied your patch and I just looked at the performance so far. The performance is still the same, so the lessperformant one is still less performant than the moreperformant.cfg. So from the performance point of view there's no difference between with and without that patch. We have a (IMHO) quite huge haproxy config with around 200 frontends, ~180 backends and ~90 listener. So the intention was to combine a http bind with a ssl bind in one frontend so that we keep it at a minimum that is necessary to get it working properly: listen ssl-relay mode tcp bind-process 2 bind :443 process 2 tcp-request inspect-delay 7s acl HAS_ECC req.ssl_ec_ext eq 1 tcp-request content accept if { req_ssl_hello_type 1 } # Client Hello use-server ecc if HAS_ECC server ecc unix@/var/run/haproxy_ssl_ecc.sock send-proxy-v2 use-server rsa if !HAS_ECC server rsa unix@/var/run/haproxy_ssl_rsa.sock send-proxy-v2 frontend http_https_combined mode http bind-process 1-40 bind :80 process 1 bind unix@/var/run/haproxy_ssl_ecc.sock accept-proxy ssl crt /etc/haproxy/ssltest.pem-ECC user haproxy process 4-40 bind unix@/var/run/haproxy_ssl_rsa.sock accept-proxy ssl crt /etc/haproxy/ssltest.pem-RSA user haproxy process 3 ... default_backend somebackend The plan was to have the same http performance as before, move SSL termination onto other/independent processes and furthermore split RSA and ECC. The "bind-process 1-40" is necessary because it's otherwise not possible to bind the SSL binds to the other processes. Please also note that you'll get a build warning that first needs another fix on listen_accept() which doesn't have the same prototype between the .c and the .h (!). I'll handle it as well. Cheers, Willy -- Regards, Christian Ruppert
nbproc 1 vs >1 performance
Hi, I've prepared a simple testcase: haproxy-moreperformant.cfg: global nbproc 40 user haproxy group haproxy maxconn 175000 defaults timeout client 300s timeout server 300s timeout queue 60s timeout connect 7s timeout http-request 10s maxconn 175000 bind-process 1 frontend haproxy_test #bind-process 1-40 bind :12345 process 1 mode http default_backend backend_test backend backend_test mode http errorfile 503 /etc/haproxy/test.error # vim: set syntax=haproxy: haproxy-lessperformant.cfg: global nbproc 40 user haproxy group haproxy maxconn 175000 defaults timeout client 300s timeout server 300s timeout queue 60s timeout connect 7s timeout http-request 10s maxconn 175000 bind-process 1 frontend haproxy_test bind-process 1-40 bind :12345 process 1 mode http default_backend backend_test backend backend_test mode http errorfile 503 /etc/haproxy/test.error # vim: set syntax=haproxy: /etc/haproxy/test.error: HTTP/1.0 200 Cache-Control: no-cache Content-Type: text/plain Test123456 The test: ab -n 5000 -c 250 http://xx.xx.xx.xx:12345 With the first config I get around ~30-33k requests/s on my test system, with the second conf (only the bind-process in the frontend section has been changed!) I just get around 26-28k requests per second. I could get similar differences when playing with nbproc 1 and >1 as well as the default "bind-process" and/or the "process 1" on the actual bind. Is it really just the multi process overhead causing the performance drop here, even tough the bind uses the first / only one process anyway? -- Regards, Christian Ruppert
Re: Weird stick-tables / peers behaviour
On 2016-03-29 15:13, Christian Ruppert wrote: On 2016-03-29 10:58, Christian Ruppert wrote: Hi Willy, On 2016-03-25 18:17, Willy Tarreau wrote: On Fri, Mar 25, 2016 at 01:53:50PM +0100, Willy Tarreau wrote: I think it's even different (but could be wrong) since Christian spoke about counters suddenly doubling. The issue you faced Sylvain which I still have no idea how to fix unfortunately is that the peers applet is not always woken up when a connection establishes on the other side and it may simply miss an event, resulting in everything remaining stable and appear frozen until the connection closes. Here it seems data are exchanged but incorrect. This one could be easier to reproduce however, we'll check. OK I found it. Indeed it was easy to reproduce. The frequency counters are sent as "now - freq.date", which is a positive age compared to the current date. But on receipt, this age was *added* to the current date instead of subtracted. So since the date was always in the future, they were always expired if the activity changed side in less than the counter's measuring period (eg: 10s). I'm commiting this simple fix that you can apply to your tree for now. Cheers, Willy diff --git a/src/peers.c b/src/peers.c index c29ea73..9918dac 100644 --- a/src/peers.c +++ b/src/peers.c @@ -1153,7 +1153,7 @@ switchstate: case STD_T_FRQP: { struct freq_ctr_period data; - data.curr_tick = tick_add(now_ms, intdecode(&msg_cur, msg_end)); + data.curr_tick = tick_add(now_ms, -intdecode(&msg_cur, msg_end)); if (!msg_cur) { /* malformed message */ appctx->st0 = PEER_SESS_ST_ERRPROTO; Thanks a lot for the fast investigation! The proposed patch seems to do the trick :) Hrm, or not. At least not completely. There's still something wrong it seems: 20160329 15:07:03: 0x3bca858: key=xx.xx.xx.xx use=0 exp=28799601 gpc0=0 conn_cnt=682 conn_rate(1)=1 conn_cur=3 sess_cnt=1 sess_rate(1)=-1032058827 http_req_cnt=0 http_req_rate(1)=2272 http_err_cnt=3 http_err_rate(1)=1143800 bytes_in_cnt=0 bytes_out_cnt=247977 Note the sess_rate is a negative int. Some http_err_rate seems to be affected as well. Even the http_req_rate seems to be still wrong, in some cases. 20160329 15:11:38: 0x3e67318: key=xx.xx.xx.xx use=0 exp=28605259 gpc0=0 conn_cnt=86 conn_rate(1)=0 conn_cur=7 sess_cnt=0 sess_rate(1)=0 http_req_cnt=0 http_req_rate(1)=349038424 http_err_cnt=6 http_err_rate(1)=0 bytes_in_cnt=0 bytes_out_cnt=3261818950 We're using httpclose so in this case it *actually* should match the conn_cnt so 86. I haven't had enough time yet but it looks like I had one case where the now_ms? was used as value and if that would explain the integer overflow within http_sess_rate if that is added furthermore. -- Regards, Christian Ruppert
Re: Weird stick-tables / peers behaviour
On 2016-03-29 10:58, Christian Ruppert wrote: Hi Willy, On 2016-03-25 18:17, Willy Tarreau wrote: On Fri, Mar 25, 2016 at 01:53:50PM +0100, Willy Tarreau wrote: I think it's even different (but could be wrong) since Christian spoke about counters suddenly doubling. The issue you faced Sylvain which I still have no idea how to fix unfortunately is that the peers applet is not always woken up when a connection establishes on the other side and it may simply miss an event, resulting in everything remaining stable and appear frozen until the connection closes. Here it seems data are exchanged but incorrect. This one could be easier to reproduce however, we'll check. OK I found it. Indeed it was easy to reproduce. The frequency counters are sent as "now - freq.date", which is a positive age compared to the current date. But on receipt, this age was *added* to the current date instead of subtracted. So since the date was always in the future, they were always expired if the activity changed side in less than the counter's measuring period (eg: 10s). I'm commiting this simple fix that you can apply to your tree for now. Cheers, Willy diff --git a/src/peers.c b/src/peers.c index c29ea73..9918dac 100644 --- a/src/peers.c +++ b/src/peers.c @@ -1153,7 +1153,7 @@ switchstate: case STD_T_FRQP: { struct freq_ctr_period data; - data.curr_tick = tick_add(now_ms, intdecode(&msg_cur, msg_end)); + data.curr_tick = tick_add(now_ms, -intdecode(&msg_cur, msg_end)); if (!msg_cur) { /* malformed message */ appctx->st0 = PEER_SESS_ST_ERRPROTO; Thanks a lot for the fast investigation! The proposed patch seems to do the trick :) Hrm, or not. At least not completely. There's still something wrong it seems: 20160329 15:07:03: 0x3bca858: key=xx.xx.xx.xx use=0 exp=28799601 gpc0=0 conn_cnt=682 conn_rate(1)=1 conn_cur=3 sess_cnt=1 sess_rate(1)=-1032058827 http_req_cnt=0 http_req_rate(1)=2272 http_err_cnt=3 http_err_rate(1)=1143800 bytes_in_cnt=0 bytes_out_cnt=247977 Note the sess_rate is a negative int. Some http_err_rate seems to be affected as well. Even the http_req_rate seems to be still wrong, in some cases. 20160329 15:11:38: 0x3e67318: key=xx.xx.xx.xx use=0 exp=28605259 gpc0=0 conn_cnt=86 conn_rate(1)=0 conn_cur=7 sess_cnt=0 sess_rate(1)=0 http_req_cnt=0 http_req_rate(1)=349038424 http_err_cnt=6 http_err_rate(1)=0 bytes_in_cnt=0 bytes_out_cnt=3261818950 We're using httpclose so in this case it *actually* should match the conn_cnt so 86. -- Regards, Christian Ruppert
Re: Weird stick-tables / peers behaviour
Hi Willy, On 2016-03-25 18:17, Willy Tarreau wrote: On Fri, Mar 25, 2016 at 01:53:50PM +0100, Willy Tarreau wrote: I think it's even different (but could be wrong) since Christian spoke about counters suddenly doubling. The issue you faced Sylvain which I still have no idea how to fix unfortunately is that the peers applet is not always woken up when a connection establishes on the other side and it may simply miss an event, resulting in everything remaining stable and appear frozen until the connection closes. Here it seems data are exchanged but incorrect. This one could be easier to reproduce however, we'll check. OK I found it. Indeed it was easy to reproduce. The frequency counters are sent as "now - freq.date", which is a positive age compared to the current date. But on receipt, this age was *added* to the current date instead of subtracted. So since the date was always in the future, they were always expired if the activity changed side in less than the counter's measuring period (eg: 10s). I'm commiting this simple fix that you can apply to your tree for now. Cheers, Willy diff --git a/src/peers.c b/src/peers.c index c29ea73..9918dac 100644 --- a/src/peers.c +++ b/src/peers.c @@ -1153,7 +1153,7 @@ switchstate: case STD_T_FRQP: { struct freq_ctr_period data; - data.curr_tick = tick_add(now_ms, intdecode(&msg_cur, msg_end)); + data.curr_tick = tick_add(now_ms, -intdecode(&msg_cur, msg_end)); if (!msg_cur) { /* malformed message */ appctx->st0 = PEER_SESS_ST_ERRPROTO; Thanks a lot for the fast investigation! The proposed patch seems to do the trick :) -- Regards, Christian Ruppert
Re: src_get_gpc0 seems not to work after commit f71f6f6
Hi Seri, On 2016-03-23 08:40, Sehoon Kim wrote: Hi, As below, I use stick-table for temporary acl. After commit f71f6f6, src_get_gpc0 seems not to work. So, I revert commit f71f6f6, and it works!! That's not a valid commit in the official haproxy repo, can you please check the hash again? frontend SSL-Offload bind :443 ssl crt ssl.pem ecdhe prime256v1 tcp-request connection accept if { src_get_gpc0(whitelist) eq 1 } tcp-request connection reject backend whitelist stick-table type ip size 1m expire 1h nopurge store gpc0 Thanks Seri -- Regards, Christian Ruppert
Weird stick-tables / peers behaviour
Hi all, I've just upgraded some hosts to 1.6.4 (from 1.5) and immediately got a bunch of SMS because we're using stick-tables to track the connections and monitor http_req_rate. The stick-tables data will be synced to the other peers using the "peers" section. So I setup a test case using two HAProxy instances with e.g.: global user haproxy group haproxy maxconn 1 stats socket /var/run/haproxy.stat user haproxy gid haproxy mode 600 level admin # aus der anti-dos config defaults timeout client 60s timeout server 60s timeout queue 60s timeout connect 3s timeout http-request 10s frontend test bind 0.0.0.0:8080 mode http tcp-request inspect-delay 7s tcp-request content track-sc1 src table backend_sourceip tcp-request content reject if { sc1_http_req_rate(backend_sourceip) gt 15 } http-request deny if { sc1_http_req_rate(backend_sourceip) gt 15 } peers foo_peers peer host1 172.16.0.128:8024 peer host2 172.16.0.16:8024 backend backend_sourceip # 1mio IPs, 8hrs TTL per entry for several stats per IP in 10s stick-table type ip size 1m expire 8h store gpc0,conn_cnt,conn_cur,conn_rate(10s),http_req_cnt,http_req_rate(10s),http_err_cnt,http_err_rate(10s) peers foo_peers I then have 4 terminals, two for doing a: watch "echo 'show table backend_sourceip' | socat stdio /var/run/haproxy.stat" and two for doing some "curl -Lvs http://127.0.0.1:8080"; by hand. If you do some on the first and some on the second host you'll notice different values on one side. Also the counter may e.g. double while the other side has the correct/actual value. This results into several thousands of requests on our prod. systems but according to the logs it can't be correct. Does anybody else have similar weirdness or can you guys confirm false values? The *_cnt values seem to be ok but the *_rate ones seem to be false in some cases. -- Regards, Christian Ruppert
capturing samples / evaluating conditionals
Hi, I'm trying to setup a parallel RSA/ECC setup as described here: http://blog.haproxy.com/2015/07/15/serving-ecc-and-rsa-certificates-on-same-ip-with-haproxy/ but in my case the sample was never captured and thus the ECC backend has never been used until I added something else that depends on a sample, like: acl foo req_ssl_ver lt 3 tcp-request content reject if foo So I thought instead of adding something that triggers the sample capture I could use something like this: ... tcp-request inspect-delay 4s acl HAS_ECC req.ssl_ec_ext eq 1 tcp-request content reject if !HAS_ECC use_backend ssl-ecc if HAS_ECC ... That works so far but for some reason the smp_fetch_req_ssl_ec_ext() is called twice. On the first call the sample buffer is empty again but on the second it's filled with the actual capture and it seems to work. So my question(s) now: 1. Is/was it really intentional to not evaluate if there's just something like "use_backend somebackend if { ... }" 2. Why is the function called twice? That's only when using the ACL variant. Using the workaround with req_ssl_ver above just calls it once. -- Regards, Christian Ruppert
Re: General SSL vs. non-SSL Performance
Hi Cyril, On 2016-03-16 16:14, Cyril Bonté wrote: Hi all, replying really quickly from a webmail, sorry for the lack of details [...] I also ran 2 parallel "ab" on two separate machines against a third one. The requests per second were around ~70 r/s per host instead of ~140. So I doubt it's a entropy problem. The issue is in your haproxy configuration : you disabled HTTP keep-alive by using "option httpclose", so you are benchmarking SSL handshakes and your values are not unusual in that case. Please try with something else, like "option http-server-close". The "option httpclose" was on purpose. Also the client could (during a attack) simply do the same and achieve the same result. I don't think that will help in such cases. -- Regards, Christian Ruppert
Re: General SSL vs. non-SSL Performance
Hi Willy, On 2016-03-17 06:05, Willy Tarreau wrote: Hi Christian, On Wed, Mar 16, 2016 at 05:25:53PM +0100, Christian Ruppert wrote: Hi Lukas, On 2016-03-16 16:53, Lukas Tribus wrote: >>The "option httpclose" was on purpose. Also the client could (during a >>attack) simply do the same and achieve the same result. I don't think >>that will help in such cases. > >So what you are actually and purposely benchmarking are SSL/TLS >handshakes, because thats the bottleneck you are trying to improve. You're right, yes. You also found the hard way why it's important to share TLS secrets between multiple front nodes, or to properly distribute the load to avoid handshakes as much as possible. I also just stumbled over this: https://software.intel.com/en-us/articles/accelerating-ssl-load-balancers-with-intel-xeon-v3-processors Might be interesting for others as well. So ECC and multi-threaded/process is the way to go it seems. >Both nginx [1] and haproxy currently do not support offloading TLS >handshakes to another thread or dedicating a thread to a TLS session. > >Thats why Apache will scale better currently, because its threading. Hm, I haven't tried Apache yet but would that be a huge benefit compared to a setup using nbproc > 1? Here I don't know. TLS handshakes are one large part of what made me think that we must go multi-threaded instead of multi-process over the long term, just because I want to be able to pin some tasks to some CPUs. Ie when TLS says "handshake needed", we want to be able to migrate the task to another CPU to avoid the huge latency imposed to all other processing (eg: 7ms in your case). But note that people who have to deal with heavy SSL traffic actually deal with this in haproxy by using to levels of processing, one for HTTP and one for TLS. It means that only TLS traffic can be hurt by handshakes : listen secure bind :443 ssl crt foo.pem process 2-32 mode tcp server clear 127.0.0.1:80 frontend clear bind :80 process 1 mode http use_backend my_wonderful_server_farm ... Your example would be better and easier but we need the client IP for ACLs and so forth which wouldn't work in tcp mode and there would be no XFF header. So we're duplicating stuff in the frontend but use one backend. And before linux kernel reintroduced support for SO_REUSEPORT (in 3.9), it was common to have the single process load-balance incoming TCP connections to all other TLS processes. It then makes it possible to chose the LB algo you want, including source hash so that a same attacker can only affect one process for example. Willy -- Regards, Christian Ruppert
Re: General SSL vs. non-SSL Performance
Hi Aleks, On 2016-03-16 15:57, Aleksandar Lazic wrote: Hi. Am 16-03-2016 15:17, schrieb Christian Ruppert: Hi, this is rather HAProxy unrelated so more a general problem but anyway.. I did some tests with SSL vs. non-SSL performance and I wanted to share my results with you guys but also trying to solve the actual problem So here is what I did: [snipp] A test without SSL, using "ab": # ab -k -n 5000 -c 250 http://127.0.0.1:65410/ [snipp] That's much worse than I expected it to be. ~144 requests per second instead of 42*k*. That's more than 99% performance drop. The cipher a moderate but secure (for now), I doubt that changing the cipher will help a lot here. nginx and HAProxy performance is almost equal so it's not a problem with the server software. One could increase nbproc (at least in my case it only increased up to nbproc 4, Xeon E3-1281 v3) but that's just a rather minor enhancement. With those ~144 r/s you're basically lost when being under attack. How did you guys solve this problem? External SSL offloading, using hardware crypto foo, special cipher/settings tuning, simply *much* more hardware or not yet at all? You run both client & server on the same machine Maybe you are running out of entropy? Are you able to run the client on a different machine? BR Aleks I also ran 2 parallel "ab" on two separate machines against a third one. The requests per second were around ~70 r/s per host instead of ~140. So I doubt it's a entropy problem. -- Regards, Christian Ruppert
Re: General SSL vs. non-SSL Performance
On 2016-03-18 11:31, Christian Ruppert wrote: Hi Willy, On 2016-03-17 06:05, Willy Tarreau wrote: Hi Christian, On Wed, Mar 16, 2016 at 05:25:53PM +0100, Christian Ruppert wrote: Hi Lukas, On 2016-03-16 16:53, Lukas Tribus wrote: >>The "option httpclose" was on purpose. Also the client could (during a >>attack) simply do the same and achieve the same result. I don't think >>that will help in such cases. > >So what you are actually and purposely benchmarking are SSL/TLS >handshakes, because thats the bottleneck you are trying to improve. You're right, yes. You also found the hard way why it's important to share TLS secrets between multiple front nodes, or to properly distribute the load to avoid handshakes as much as possible. I also just stumbled over this: https://software.intel.com/en-us/articles/accelerating-ssl-load-balancers-with-intel-xeon-v3-processors Might be interesting for others as well. So ECC and multi-threaded/process is the way to go it seems. >Both nginx [1] and haproxy currently do not support offloading TLS >handshakes to another thread or dedicating a thread to a TLS session. > >Thats why Apache will scale better currently, because its threading. Hm, I haven't tried Apache yet but would that be a huge benefit compared to a setup using nbproc > 1? Here I don't know. TLS handshakes are one large part of what made me think that we must go multi-threaded instead of multi-process over the long term, just because I want to be able to pin some tasks to some CPUs. Ie when TLS says "handshake needed", we want to be able to migrate the task to another CPU to avoid the huge latency imposed to all other processing (eg: 7ms in your case). But note that people who have to deal with heavy SSL traffic actually deal with this in haproxy by using to levels of processing, one for HTTP and one for TLS. It means that only TLS traffic can be hurt by handshakes : listen secure bind :443 ssl crt foo.pem process 2-32 mode tcp server clear 127.0.0.1:80 frontend clear bind :80 process 1 mode http use_backend my_wonderful_server_farm ... Your example would be better and easier but we need the client IP for ACLs and so forth which wouldn't work in tcp mode and there would be no XFF header. So we're duplicating stuff in the frontend but use one backend. Hm, not sure how that would perform with "server ... send-proxy[-v2]" in the listen block and "bind :anotherport accept-proxy" in the frontend block, additionally. Duplication a lot of ACLs and so forth or using your example (simplified) with PROXY protocol. And before linux kernel reintroduced support for SO_REUSEPORT (in 3.9), it was common to have the single process load-balance incoming TCP connections to all other TLS processes. It then makes it possible to chose the LB algo you want, including source hash so that a same attacker can only affect one process for example. Willy -- Regards, Christian Ruppert
Re: General SSL vs. non-SSL Performance
On 2016-03-17 00:14, Nenad Merdanovic wrote: Hello, On 3/16/2016 6:25 PM, Christian Ruppert wrote: Some customers may require 4096 bit keys as it seems to be much more decent than 2048 nowadays. So you may be limited here. A test with a 2048 bit Cert gives me around ~770 requests per second, a test with an 256 bit ECC cert around 1600 requests per second. That's still more than 96% difference compared to non-SSL, way better than the 4096 bit RSA one though. I also have to make sure that even some older clients can connect to the site, so I have to take a closer look on the ECC certs and cipher then. ECC is definitively an enhancement, if there's no compatibility problem. HAproxy can, in latest versions, serve both ECC and RSA certificates depending on client support. In a fairly large environment I have found that about 85% of clients are ECC capable. Also, look at configuring TLS ticket keys and rotating them properly as well as using keepalive. The difference in performance you are observing is fairly normal. You can measure the SSL performance of your CPU using 'openssl speed' to see how many computes/s you get without the HAproxy penalty, but the numbers should be very close. Another thing you might consider is switching to OpenSSL 1.0.2 because you have a v3 Intel Xeon which has AVX2 instruction support and will benefit from improvements done in 1.0.2. That's indeed a noticeable performance increase on RSA but I couldn't notice any difference for ECC. In an SSL-heavy environment, we use servers with a lot of cores, albeit slower per core, and with a good DDoS-protection ruleset haven't experienced any attacks that weren't effectively mitigated. With a properly configured SSL stack in HAproxy (all of the things mentioned above), the CPU usage difference is almost negligible. And to be honest, there are not that many SSL-exhaustion attacks. For now perhaps, but more and more sites/customer want 100% https whether it's just cool or indeed useful doesn't matter. And I am somewhat scared if one can take down the site with very few requests just by disabling keep-alive and other features on the client side. Both nginx [1] and haproxy currently do not support offloading TLS handshakes to another thread or dedicating a thread to a TLS session. Thats why Apache will scale better currently, because its threading. Hm, I haven't tried Apache yet but would that be a huge benefit compared to a setup using nbproc > 1? No :) Your CPU can only give as much. Regards, Nenad -- Regards, Christian Ruppert
RE: General SSL vs. non-SSL Performance
On 2016-03-16 17:56, Lukas Tribus wrote: Some customers may require 4096 bit keys as it seems to be much more decent than 2048 nowadays. I've not come across any recommendations pointing in that direction, in fact 2048-bit RSA are supposed to be safe for commercial use until 2030. I don't think this is a real requirement from knowledgeable people, to be frank. That's almost always the case when talking about requirements. In any case it doesn't make any sense because if your customer really has such huge requirements you may as well switch to ECC because you won't be able to support old clients anyway. I just compared the RSA one against ECC on ssllabs and it seems there's no difference on the browser/device compatibility topic. So we should indeed consider ECC keys. That's still more than 96% difference compared to non-SSL Well your are basically benchmarking your stack with a TLS specific denial of service attack. Of course the same attack without TLS won't have noticable effect on the stack. So that number is quite obviously high. Yeah but to me it looks like almost anybody else will be affected as well when migrating to 100% https. A few hosts could easily take down the site when disabling keep-alive and so on on the client while doing some "valid" requests. So it's hard to noticed compared to http only, because they can use much less requests, connections etc. Thats why Apache will scale better currently, because its threading. Hm, I haven't tried Apache yet but would that be a huge benefit compared to a setup using nbproc> 1? I haven't tried it either, but yes, I would assume so. It also doesn't block other connections will handshaking new ones. Regards, Lukas -- Regards, Christian Ruppert
General SSL vs. non-SSL Performance
ests:0 Write errors: 0 Keep-Alive requests:0 Total transferred: 46 bytes HTML transferred: 55000 bytes Requests per second:144.14 [#/sec] (mean) Time per request: 1734.425 [ms] (mean) Time per request: 6.938 [ms] (mean, across all concurrent requests) Transfer rate: 12.95 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 326 1057 236.0 10421709 Processing:35 658 210.96601013 Waiting: 35 658 211.16591012 Total: 1264 1716 109.3 17022651 Percentage of the requests served within a certain time (ms) 50% 1702 66% 1708 75% 1712 80% 1714 90% 1720 95% 1779 98% 2158 99% 2211 100% 2651 (longest request) That's much worse than I expected it to be. ~144 requests per second instead of 42*k*. That's more than 99% performance drop. The cipher a moderate but secure (for now), I doubt that changing the cipher will help a lot here. nginx and HAProxy performance is almost equal so it's not a problem with the server software. One could increase nbproc (at least in my case it only increased up to nbproc 4, Xeon E3-1281 v3) but that's just a rather minor enhancement. With those ~144 r/s you're basically lost when being under attack. How did you guys solve this problem? External SSL offloading, using hardware crypto foo, special cipher/settings tuning, simply *much* more hardware or not yet at all? -- Regards, Christian Ruppert
RE: General SSL vs. non-SSL Performance
Hi Lukas, On 2016-03-16 16:53, Lukas Tribus wrote: The "option httpclose" was on purpose. Also the client could (during a attack) simply do the same and achieve the same result. I don't think that will help in such cases. So what you are actually and purposely benchmarking are SSL/TLS handshakes, because thats the bottleneck you are trying to improve. You're right, yes. First of all the selected cipher is very important, as is the certificate and the RSA key size. For optimal performance, you would drop your RSA certificate and get a ECC cert. If thats not a possibility then use 2048-bit RSA certificates. Your ab output suggest that the negotiated cipher is ECDHE-RSA-AES128-GCM-SHA256 - which is fine for RSA certificates, but your RSA certificate is 4096 bit long, which is where the performance penalty comes from - use 2048bit certificates or better yet use ECC certificates. read: DO NOT USE RSA certificates longer than 2048bit. Some customers may require 4096 bit keys as it seems to be much more decent than 2048 nowadays. So you may be limited here. A test with a 2048 bit Cert gives me around ~770 requests per second, a test with an 256 bit ECC cert around 1600 requests per second. That's still more than 96% difference compared to non-SSL, way better than the 4096 bit RSA one though. I also have to make sure that even some older clients can connect to the site, so I have to take a closer look on the ECC certs and cipher then. ECC is definitively an enhancement, if there's no compatibility problem. Both nginx [1] and haproxy currently do not support offloading TLS handshakes to another thread or dedicating a thread to a TLS session. Thats why Apache will scale better currently, because its threading. Hm, I haven't tried Apache yet but would that be a huge benefit compared to a setup using nbproc > 1? Hope this helps, Lukas [1] https://twitter.com/ngx_vbart/status/611956593324916736 -- Regards, Christian Ruppert
Re: Custom SSL DHparams prime
On 2015-05-21 18:20, Remi Gacogne wrote: Hi, from what I've seen in the sources and documentation a default and pre-generated prime will be used as default (unless appended to the certificate). HAProxy uses the related functions provided by OpenSSL itself (get_rfc3526_prime_2048, ...). What I miss here is an option to specify my own dhparams file to avoid using those pre-generated ones and/ore appending some to all certificates. Wouldn't it make sense to allow it to be read from a file, globally? I don't think the 2048-bit MODP group 14 used by Haproxy is at risk right now, still it can't hurt to use a large number of different groups. You can use your own dhparam by appending it to the file specified with the crt command, after your certificate chain and key. Well, I meant globally, as default. global tune.ssl.default-dh-param /path/to/custom/dhparams.pem 2048 was just an example. There is 1024 and IIRC 768 as well. One might be forced to use 1024. Also, according to the documentation HAProxy wouldn't allow/use anything greater than tune.ssl.default-dh-param which is 1024 by default - unless I misunderstood something. -- Regards, Christian Ruppert
Custom SSL DHparams prime
Hi, from what I've seen in the sources and documentation a default and pre-generated prime will be used as default (unless appended to the certificate). HAProxy uses the related functions provided by OpenSSL itself (get_rfc3526_prime_2048, ...). What I miss here is an option to specify my own dhparams file to avoid using those pre-generated ones and/ore appending some to all certificates. Wouldn't it make sense to allow it to be read from a file, globally? -- Regards, Christian Ruppert
Re: base32+src
Hi Yuan, On 2015-02-12 17:39, Yuan wrote: Hello Experts, Our customer’s website has just been brought down by bots.bots website aware. base32+src can look at src + url. I am not good at this. I am hoping I can get some help to create the needed config. Can I do the below config ; _# Begin DDOS-Protection-Config_ _# Monitor the number of request sent by an IP over a period of 10 seconds_ _ stick-table type base32+src size 1m expire 10s store gpc0,http_req_rate(10s)_ _ tcp-request connection track-sc1 src_ _ # Refuses a new connection from an abuser_ _ tcp-request content reject if { src_get_gpc0 gt 0 }_ _ # Returns a 403 response for requests in an established connection_ _ http-request deny if { src_get_gpc0 gt 0 }_ I think this config is wrong. Any help or tips or sample config using base32+src possible. Maybe a Link where someone posted a sample config using base32+src. I have both port 80 & port 443 with port 80 rewrite to port 443. Due to lack of of time I can't help you that much but what you miss is increasing the gpc0 counter. You should take a look at "haproxy rate limiting" stuff, there are some good examples out there, e.g.: http://brokenhaze.com/blog/2014/03/25/how-stack-exchange-gets-the-most-out-of-haproxy/ It's also pretty easy to test with a few shells, curl and socat. I had some help from Willy about using base32+src which I understood in theory but I am not good enough to convert that wonderful advise to a workable config. Best regards, ; Yuan -- Regards, Christian Ruppert
Re: Global ACLs
Hi Willy, On 2015-02-02 17:31, Willy Tarreau wrote: Hi Christian, On Mon, Feb 02, 2015 at 04:55:56PM +0100, Christian Ruppert wrote: Hey, are there some kind of global ACLs perhaps? I think that could be really useful. In my case I have ~70 frontends and ~100 backends. I often use the same ACLs on multiple frontends/backends for specific whitelisting etc. It would be extremely helpful to specify some of those ACLs in the global scope and use it where needed without having to re-define it again and again. Technically that shouldn't be much different from what it does in the local scope, shouldn't it? So I guess the ACL is prepare once on startup, it shouldn't matter where that is done. Using it so actually evaluating it is always (as before) done in the local scope, depending on the actual Layer etc. So adding support for global ACLs should be easy and helpful, or am I wrong? Did I forgot something important here? Example: global acl foo src 192.168.1.1 acl foobar hdr_ip(X-Forwarded-For,-1) 192.168.1.2 # This *might* be a special case... Not yet further verified. frontend example use_backend ... if foo use_backend ... if foobar We've been considering this for a while now without any elegant solution. Recently while discussing with Emeric we got an idea to implement "scopes", and along these lines I think we could instead try to inherit ACLs from other frontends/backends/defaults sections. Currently defaults sections support having a name, though this name is not internally used, admins often put some notes there such as "tcp" or a customer's id. That would be perfect, even better than just global. One could use the same ACL names but in different scopes, i.e. different layer. Here we could have something like this : defaults foo acl local src 127.0.0.1 frontend bar acl client src 192.168.0.0/24 use_backend c1 if client use_backend c2 if foo/local It would also bring the extra benefit of allowing complex shared configs to use their own "global" ACLs regardless of what is being used in other sections. That's just an idea, of course. Yeah, that sounds pretty decent to me. Regards, Willy -- Regards, Christian Ruppert
Global ACLs
Hey, are there some kind of global ACLs perhaps? I think that could be really useful. In my case I have ~70 frontends and ~100 backends. I often use the same ACLs on multiple frontends/backends for specific whitelisting etc. It would be extremely helpful to specify some of those ACLs in the global scope and use it where needed without having to re-define it again and again. Technically that shouldn't be much different from what it does in the local scope, shouldn't it? So I guess the ACL is prepare once on startup, it shouldn't matter where that is done. Using it so actually evaluating it is always (as before) done in the local scope, depending on the actual Layer etc. So adding support for global ACLs should be easy and helpful, or am I wrong? Did I forgot something important here? Example: global acl foo src 192.168.1.1 acl foobar hdr_ip(X-Forwarded-For,-1) 192.168.1.2 # This *might* be a special case... Not yet further verified. frontend example use_backend ... if foo use_backend ... if foobar -- Regards, Christian Ruppert
Re: No TCP RST on tcp-request connection reject
Hi Baptiste, tarpit is pretty handy but as far as I understood it will keep the connection open, on both sides. So at some point (pretty quickly actually) we cannot handle any more connections on that host. The host will become slow and/or unresponsive. When we close the connection on our local side but don't notify the remote side it will probably exhaust the attacker and we could handle more connections and/or free and re-use such connections that has been classified too much. On 01/14/2015 05:28 PM, Baptiste wrote: > On Wed, Jan 14, 2015 at 5:00 PM, Christian Ruppert > wrote: >> Hey guys, >> >> just a thought... wouldn't it make sense to add an option to "tcp-request >> connection reject" to disable the actual TCP RST? So, an attacker tries to >> (keep) open a lot of ports: >> >> a) HAProxy (configured with rate limiting etc.) does a "tcp-request >> connection >> reject" which ends up as a TCP RST. The attacker gets the RST and >> immediately again >> b) the same as a) but the socket will be closed on the server side but no >> RST, >> nothing will be sent back to the remote side. The connections on the remote >> side >> will be kept open until timeout. >> >> Wouldn't it make sense to implement an option for b) so it can be used during >> major attacks or so? >> > > Hi Christian, > > Have you had a look at tarpit related options from HAProxy? > You can slowdown the attack thanks to it. > > Baptiste > -- Mit freundlichen Grüßen, Christian Ruppert Systemadministrator .. Babiel GmbH Erkrather Str. 224 a D-40233 Düsseldorf Tel: 0211-179349 0 Fax: 0211-179349 29 c.rupp...@babiel.com http://www.babiel.com GESCHÄFTSFÜHRER Georg Babiel, Dr. Rainer Babiel, Harald Babiel Amtsgericht Düsseldorf HRB 38633 DISCLAIMER The information transmitted in this electronic mail message may contain confidential and or privileged materials. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you receive such e-mails in error, please contact the sender and delete the material from any computer.
No TCP RST on tcp-request connection reject
Hey guys, just a thought... wouldn't it make sense to add an option to "tcp-request connection reject" to disable the actual TCP RST? So, an attacker tries to (keep) open a lot of ports: a) HAProxy (configured with rate limiting etc.) does a "tcp-request connection reject" which ends up as a TCP RST. The attacker gets the RST and immediately again b) the same as a) but the socket will be closed on the server side but no RST, nothing will be sent back to the remote side. The connections on the remote side will be kept open until timeout. Wouldn't it make sense to implement an option for b) so it can be used during major attacks or so?
Re: how to update config w/o stopping the haproxy service
On 04/28/13 at 01:00PM -0400, S Ahmed wrote: > Hi, > > 1. Is there a way to update the config file without having to stop/start > the haproxy service? e.g. when I need to update the ip addresses of the > backend servers (using ec2) > > 2. During migrations, say I have 10 backend servers, what if I want to stop > taking requests for 5 of the 10 servers, is the best way to update the > config and just remove them? Or is there a smoother transition somehow > that won't causes errors during the transition? > i.e. would it be possible to finish the requests, but stop responding to > new requests for those 5 servers I want to take offline. See https://code.google.com/p/haproxy-docs/wiki/disabled You can restart HAProxy by e.g.:-D -p /var/run/haproxy.pid -f /etc/haproxy.cfg -sf $(cat /var/run/haproxy.pid) Alternatively you could use the control socket by using socat: https://code.google.com/p/haproxy-docs/wiki/UnixSocketCommands So e.g. "disable server backend1/server1" Or even via the stats interface with "stats admin if ...". -- Regards, Christian Ruppert pgpOHYOzE_D16.pgp Description: PGP signature
AW: use_backend: brackets/grouping not accepted in condition
Hi Bryan, I am somewhat confused now.. So it sounds like the behavior of the brackets in combination with default_backend is wrong since it seems to work fine there even with IP ACLs. And what I meant is, wouldn’t it make sense to support e.g. IP ACLs with either {} or () or whatever else to allow one to group the rules instead of writing multiple use_backend lines? For small stuff, like in my example, it would make it slightly “easier”. use_backend if somecondition (foo or bar) vs. use_backend if someconditoon foo use_backend if someconditoon bar Mit freundlichen Grüßen, Christian Ruppert Christian Ruppert Systemadministrator Babiel GmbH Erkrather Str. 224 a D-40233 Düsseldorf Tel: 0211-179349 0 Fax: 0211-179349 29 E-Mail: c.rupp...@babiel.com Internet: http://www.babiel.com <http://www.babiel.com/> Geschäftsführer: Georg Babiel, Dr. Rainer Babiel, Harald Babiel Amtsgericht Düsseldorf HRB 38633 ~~ DISCLAIMER ~~~ The information transmitted in this electronic mail message may contain confidential and or privileged materials. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you receive such e-mails in error, please contact the sender and delete the material from any computer. Von: Bryan Talbot [mailto:btal...@aeriagames.com] Gesendet: Freitag, 22. März 2013 16:35 An: Christian Ruppert Cc: Baptiste; HAproxy Mailing Lists Betreff: Re: use_backend: brackets/grouping not accepted in condition On Fri, Mar 22, 2013 at 2:47 AM, Christian Ruppert wrote: Hi Baptiste, it is IMHO not really clear that brackets are for anonymous ACLs only. Wouldn't it make sense to support it for use_backend as well? Those two are not mutually exclusive: you can use them with use_backend and they are for anonymous acls. for example: use_backend www if METH_POST or {path_beg /static /images /img /css} -Bryan
AW: use_backend: brackets/grouping not accepted in condition
Hi Baptiste, it is IMHO not really clear that brackets are for anonymous ACLs only. Wouldn't it make sense to support it for use_backend as well? It just makes things easier in my opinion. Mit freundlichen Grüßen, Christian Ruppert ---- Christian Ruppert Systemadministrator Babiel GmbH Erkrather Str. 224 a D-40233 Düsseldorf Tel: 0211-179349 0 Fax: 0211-179349 29 E-Mail: c.rupp...@babiel.com Internet: http://www.babiel.com Geschäftsführer: Georg Babiel, Dr. Rainer Babiel, Harald Babiel Amtsgericht Düsseldorf HRB 38633 ~~ DISCLAIMER ~~~ The information transmitted in this electronic mail message may contain confidential and or privileged materials. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you receive such e-mails in error, please contact the sender and delete the material from any computer. > -Ursprüngliche Nachricht- > Von: Baptiste [mailto:bed...@gmail.com] > Gesendet: Donnerstag, 21. März 2013 20:00 > An: Christian Ruppert > Cc: haproxy@formilux.org > Betreff: Re: use_backend: brackets/grouping not accepted in condition > > Hi Christian, > > Brackets are for anonymous ACLs only. > You seem to use named ACLs with brackets so it can't work. > > Either you do as you said: > use_backend backend_test if request_domain1 allowed_ip_foo or > request_domain1 allowed_ip_bar > > Or with 2 use_backend: > use_backend backend_test if request_domain1 allowed_ip_foo > use_backend backend_test if request_domain1 allowed_ip_bar > > Baptiste > > > > On Thu, Mar 21, 2013 at 6:25 PM, Christian Ruppert > wrote: > > Hi Guys, > > > > I just tried to simplify some rules and I noticed that brackets {} doesn't > > work > with use_backend while it works fine with default_backend. > > > > That doesn't work: > > use_backend backend_test if request_domain1 { allowed_ip_foo or > allowed_ip_bar } > > > > That works: > > use_backend backend_test if request_domain1 allowed_ip_foo or > request_domain1 allowed_ip_bar > > > > That works as well: > > default_backend backend_main if request_domain2 { allowed_ip_foo or > allowed_ip_bar } > > > > I could also use multiple use_backend's but using brackets would make it a > lot easier and better readable IMHO. > > > > https://code.google.com/p/haproxy-docs/wiki/UsingACLs > > That also sounds like the brackets should work almost everywhere. > > > > "Some actions are only performed upon a valid condition. A condition is a > > combination of ACLs with operators. 3 operators are supported : > > > > - AND (implicit) > > - OR (explicit with the "or" keyword or the "||" operator) > > - Negation with the exclamation mark ("!") > > > > A condition is formed as a disjunctive form: > > > > [!]acl1 [!]acl2 ... [!]acln { or [!]acl1 [!]acl2 ... [!]acln } ... > > > > Such conditions are generally used after an "if" or "unless" statement, > > indicating when the condition will trigger the action." > > > > I would really like to see that fixed. Or is that on purpose? > > > > Mit freundlichen Grüßen, > > Christian Ruppert > > > > > > > > Christian Ruppert > > Systemadministrator > > > > Babiel GmbH > > Erkrather Str. 224 a > > D-40233 Düsseldorf > > > > Tel: 0211-179349 0 > > Fax: 0211-179349 29 > > E-Mail: c.rupp...@babiel.com > > Internet: http://www.babiel.com > > > > Geschäftsführer: Georg Babiel, Dr. Rainer Babiel, Harald Babiel Amtsgericht > Düsseldorf HRB 38633 > > > > ~~ DISCLAIMER ~~~ > > > > The information transmitted in this electronic mail message may contain > confidential and or privileged materials. Any review, retransmission, > dissemination or other use of or taking of any action in reliance upon, this > information by persons or entities other than the intended recipient is > prohibited. If you receive such e-mails in error, please contact the sender > and > delete the material from any computer.
use_backend: brackets/grouping not accepted in condition
Hi Guys, I just tried to simplify some rules and I noticed that brackets {} doesn't work with use_backend while it works fine with default_backend. That doesn't work: use_backend backend_test if request_domain1 { allowed_ip_foo or allowed_ip_bar } That works: use_backend backend_test if request_domain1 allowed_ip_foo or request_domain1 allowed_ip_bar That works as well: default_backend backend_main if request_domain2 { allowed_ip_foo or allowed_ip_bar } I could also use multiple use_backend's but using brackets would make it a lot easier and better readable IMHO. https://code.google.com/p/haproxy-docs/wiki/UsingACLs That also sounds like the brackets should work almost everywhere. "Some actions are only performed upon a valid condition. A condition is a combination of ACLs with operators. 3 operators are supported : - AND (implicit) - OR (explicit with the "or" keyword or the "||" operator) - Negation with the exclamation mark ("!") A condition is formed as a disjunctive form: [!]acl1 [!]acl2 ... [!]acln { or [!]acl1 [!]acl2 ... [!]acln } ... Such conditions are generally used after an "if" or "unless" statement, indicating when the condition will trigger the action." I would really like to see that fixed. Or is that on purpose? Mit freundlichen Grüßen, Christian Ruppert Christian Ruppert Systemadministrator Babiel GmbH Erkrather Str. 224 a D-40233 Düsseldorf Tel: 0211-179349 0 Fax: 0211-179349 29 E-Mail: c.rupp...@babiel.com Internet: http://www.babiel.com Geschäftsführer: Georg Babiel, Dr. Rainer Babiel, Harald Babiel Amtsgericht Düsseldorf HRB 38633 ~~ DISCLAIMER ~~~ The information transmitted in this electronic mail message may contain confidential and or privileged materials. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you receive such e-mails in error, please contact the sender and delete the material from any computer.
IPv6 ACLs for 1.4.x
Hi guys, I saw that 1.5.x will have IPv6 ACL support. Would it be possible to backport it to 1.4.x? :) I haven't looked at the patch yet though so I don't know how much work it may be. -- Regards, Christian Ruppert pgpScmPJYmYHN.pgp Description: PGP signature