High load average under 1.8 with multiple draining processes
We've recently upgraded to HAProxy 1.8.3, which we run with `nbthread 4` (we used to run nbproc 4 with older releases). This has generally been good, especially for stick tables & stats. We terminate SSL and proxy a large number of long-running TCP connections (websockets). When configuration changes (usually a server going up or down), a reload occurs. For many hours, there may be 2-5 active HAProxy instances as the old ones drain. We use hard-stop-after to keep it reasonable. I have noticed that load average gets *very* high while these processes are still present. Our web tier is usually at a load average of 5 with 16 cores. Across the board, load averages go up if stale HAProxy instances are active. I saw as high as 34 with one instance that had 5 instances, and 100% CPU, most of it sys. Even with just 2, the loadavg is double what it is with 1. Terminating the old process immediately brings the load down. Is there a regression in the 1.8 series with SO_REUSEPORT and nbthread (we didn't see this before with nbproc) or somewhere we should start looking? We make (relatively) heavy use of stick tables for DDoS protection purposes and terminate SSL, but aside from that our configuration is pretty vanilla. Nothing changed from 1.7 to 1.8 except changing nbproc to nbthread. Thanks!
Re: cannot bind socket - Need help with config file
Hello, On 11 January 2018 at 16:36, Jonathan Matthewswrote: > On 11 January 2018 at 00:03, Imam Toufique wrote: >> So, I have everything in the listen section commented out: >> >> frontend main >>bind :2200 >>default_backend sftp >>timeout client 5d >> >> >> #listen stats >> # bind *:2200 >> # mode tcp >> # maxconn 2000 >> # option redis-check >> # retries 3 >> # option redispatch >> # balance roundrobin >> >> #use_backend sftp_server >> backend sftp >> balance roundrobin >> server web 10.0.15.21:2200 check weight 2 >> server nagios 10.0.15.15:2200 check weight 2 >> >> Is that what I need, right? > > I suspect you won't need to have your *backend*'s ports changed to > 2200. Your SSH server on those machines is *probably* also your SFTP > server That's exactly right, your backend destination port should probably 22, there is no need to bump that one to 2200. > As an aside, it's not clear why you're trying to do this. You've > already hit the host-key-changing problem, and unless you have a > *very* specific use case, your users will hit the "50% of the time I > connect, my files have gone away" problem soon. So you've probably got > to solve the shared-storage problem on your backends ... which turns > them in to stateless SFTP-to-FS servers. > > In my opinion adding haproxy as a TCP proxy in your architecture adds > very little, if anything. If I were you, I'd strongly consider just > sync'ing the same host key to each server, putting their IPs in a > low-TTL DNS record, and leaving haproxy out of the setup. With DNS round-robin instead of haproxy you have the same exact requirements regarding SSH keys and filesystem synchronization, with all the disadvantages (no health checks, no direct control of the actual load-balancing, no stats, no logs, etc). I'm really not sure why you'd recommend DNS RR instead of haproxy here. Load-balancing a single-port TCP protocol between 2 backends is a bread and butter use-case for haproxy. Regards, Lukas
Re: [PATCH]: hathread / atomic code for clang too
On Thu, Jan 11, 2018 at 04:39:54PM +, David CARLIER wrote: > :-) I forgot to mention can be backported to 1.8 as well. Cheers ! I already did, just forgot to push. Thanks, Willy
Re: [PATCH]: hathread / atomic code for clang too
:-) I forgot to mention can be backported to 1.8 as well. Cheers ! On 11 January 2018 at 14:30, Willy TARREAUwrote: > On Thu, Jan 11, 2018 at 02:26:06PM +, David CARLIER wrote: > > Hi here a tiny fix proposal for the previous commit. > > Hi David. That's funny, I wondered if Clang would advertise itself as gcc, > possibly > an old one, just to annoy us, and it seems it does :-) > > Thanks for the fix! > > Willy >
Re: cannot bind socket - Need help with config file
On 11 January 2018 at 00:03, Imam Toufiquewrote: > So, I have everything in the listen section commented out: > > frontend main >bind :2200 >default_backend sftp >timeout client 5d > > > #listen stats > # bind *:2200 > # mode tcp > # maxconn 2000 > # option redis-check > # retries 3 > # option redispatch > # balance roundrobin > > #use_backend sftp_server > backend sftp > balance roundrobin > server web 10.0.15.21:2200 check weight 2 > server nagios 10.0.15.15:2200 check weight 2 > > Is that what I need, right? I suspect you won't need to have your *backend*'s ports changed to 2200. Your SSH server on those machines is *probably* also your SFTP server. I don't recall if you can serve a different/sync'd host key per port in sshd, but this might be a reason to run a different daemon on a higher port as you're doing. As an aside, it's not clear why you're trying to do this. You've already hit the host-key-changing problem, and unless you have a *very* specific use case, your users will hit the "50% of the time I connect, my files have gone away" problem soon. So you've probably got to solve the shared-storage problem on your backends ... which turns them in to stateless SFTP-to-FS servers. In my opinion adding haproxy as a TCP proxy in your architecture adds very little, if anything. If I were you, I'd strongly consider just sync'ing the same host key to each server, putting their IPs in a low-TTL DNS record, and leaving haproxy out of the setup. J
Re: [PATCH]: hathread / atomic code for clang too
On Thu, Jan 11, 2018 at 02:26:06PM +, David CARLIER wrote: > Hi here a tiny fix proposal for the previous commit. Hi David. That's funny, I wondered if Clang would advertise itself as gcc, possibly an old one, just to annoy us, and it seems it does :-) Thanks for the fix! Willy
[PATCH]: hathread / atomic code for clang too
Hi here a tiny fix proposal for the previous commit. Hope it is good. Kind regards. From c1f299b45e56c77fc51b2e773272195ddaee46a7 Mon Sep 17 00:00:00 2001 From: David CarlierDate: Thu, 11 Jan 2018 14:20:43 + Subject: [PATCH] BUILD/MINOR: ancient gcc versions atomic fix Commit 1a69af6d3892fe1946bb8babb3044d2d26afd46e introduced code for atomic prior to 4.7. Unfortunately clang uses as well those constants which is misleading. --- include/common/hathreads.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/common/hathreads.h b/include/common/hathreads.h index 503abbec..5f0b9695 100644 --- a/include/common/hathreads.h +++ b/include/common/hathreads.h @@ -100,7 +100,7 @@ extern THREAD_LOCAL unsigned long tid_bit; /* The bit corresponding to the threa /* TODO: thread: For now, we rely on GCC builtins but it could be a good idea to * have a header file regrouping all functions dealing with threads. */ -#if defined(__GNUC__) && (__GNUC__ < 4 || __GNUC__ == 4 && __GNUC_MINOR__ < 7) +#if defined(__GNUC__) && (__GNUC__ < 4 || __GNUC__ == 4 && __GNUC_MINOR__ < 7) && !defined(__clang__) /* gcc < 4.7 */ #define HA_ATOMIC_ADD(val, i)__sync_add_and_fetch(val, i) -- 2.15.1
Re: Print header lua script
Hi. I have written a blog post how to use this lua scirpt in a dockerized haproxy. https://www.me2digital.com/blog/2018/01/show-headers-in-haproxy/ I appreciate any feedback ;-) Best regards Aleks -- Originalnachricht -- Von: "Aleksandar Lazic"An: haproxy@formilux.org Gesendet: 10.01.2018 23:33:01 Betreff: Print header lua script Hi. I have the need to print the request headers which reaches the haproxy. Today I was sitting down and started to write this small script. Due to the fact that this is my first lua & haproxy script I'm sure it's not the most efficient version ;-) I appreciate any feedback ;-) ### print_headers.lua core.register_action("print-headers",{ "http-req" }, function(transaction) --[[ transaction is of class TXN. TXN contains contains a property 'http' which is an instance of HAProxy HTTP class ]] local hdr = transaction.http:req_get_headers() for key,value in pairs(hdr) do local mystr = key mystr = mystr .. ": " for _ ,val2 in pairs(value) do mystr = mystr .. val2 end core.Info(mystr) end end) ### ### haproxy.cfg global log /dev/log local1 debug lua-load /etc/haproxy/print_headers.lua defaults log global mode http listen proxy001 http-request lua.print-headers ### One issue is that I have every entry twice in the log, I assume this could be because I defined the loglevel "debug" and the 'core.Info' is in info level ### Jan 10 23:22:58 app001 haproxy[26681]: accept-encoding: deflate, gzip Jan 10 23:22:58 app001 haproxy[26681]: accept-encoding: deflate, gzip Jan 10 23:22:58 app001 haproxy[26681]: content-type: application/json Jan 10 23:22:58 app001 haproxy[26681]: content-type: application/json Jan 10 23:22:58 app001 haproxy[26681]: user-agent: curl/7.47.0 Jan 10 23:22:58 app001 haproxy[26681]: user-agent: curl/7.47.0 Jan 10 23:22:58 app001 haproxy[26681]: x-request-id: 3 Jan 10 23:22:58 app001 haproxy[26681]: x-request-id: 3 Jan 10 23:22:58 app001 haproxy[26681]: content-length: 63 Jan 10 23:22:58 app001 haproxy[26681]: content-length: 63 Jan 10 23:22:58 app001 haproxy[26681]: host: MY_HOST:1234 Jan 10 23:22:58 app001 haproxy[26681]: host: MY_HOST:1234 Jan 10 23:22:58 app001 haproxy[26681]: accept: */* Jan 10 23:22:58 app001 haproxy[26681]: accept: */* ### Thanks for feedback. aleks
Re: How to parse custom PROXY protocol v2 header for custom routing in HAProxy configuration?
Hi. -- Originalnachricht -- Von: "Adam Sherwood"An: haproxy@formilux.org Gesendet: 10.01.2018 23:40:25 Betreff: How to parse custom PROXY protocol v2 header for custom routing in HAProxy configuration? I have written this up as a StackOverflow question here: https://stackoverflow.com/q/48195311/2081835. When adding PROXY v2 with AWS VPC PrivateLink connected to a Network Load Balancer, the endpoint ID of the connecting account is added as a TLV. I need to use this for routing frontend to backend, but I cannot sort out how. Is there a way to call a custom matcher that could do the parsing logic, or is this already built-in and I'm just not finding the documentation? Any ideas on the topic would be super helpful. Thank you. Looks like AWS use the "2.2.7. Reserved type ranges" as described in https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt therefore you will need to parse this part by your own. This could be possible in lua, maybe I'm not an expert in lua, yet ;-) There are javexamples in the doc link ( https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html#proxy-protocol ) which you have added int the stackoverflow request. Regards Aleks
Segfault on haproxy 1.7.10 with state file and slowstart
Hello, Haproxy 1.7.10 segfaults when the srv_admin_state is set to SRV_ADMF_CMAINT (0x04) for a backend server, and that backend has the `slowstart` option set. The following configuration reproduces it : - # haproxy.cfg (replace below) global maxconn 3 user haproxy group haproxy server-state-file //servers.state log-tag haproxy nbproc 1 cpu-map 1 2 stats socket /run/haproxy.sock level admin stats socket /run/haproxy_op.sock mode 666 level operator defaults mode http option forwardfor option dontlognull option httplog log 127.0.0.1 local1 debug timeout connect 5s timeout client 50s timeout server 50s timeout http-request 8s load-server-state-from-file global listen admin bind *:9002 stats enable stats auth haproxyadmin:xxx frontend testserver bind *:9000 option tcp-smart-accept option splice-request option splice-response default_backend testservers backend testservers balance roundrobin option tcp-smart-connect option splice-request option splice-response timeout server 2s timeout queue 2s default-server maxconn 10 *slowstart 10s* weight 1 server testserver15 10.0.19.10:9003check server testserver16 10.0.19.12:9003check server testserver17 169.254.0.9:9003 disabledcheck server testserver20 169.254.0.9:9003 disabledcheck # servers.state file 1 # be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state srv_uweight srv_iweight srv_time_since_last_change srv_check_status srv_check_result srv_check_health srv_check_state srv_agent_state bk_f_forced_id srv_f_forced_id 4 testservers 1 testserver15 10.0.19.10 2 0 1 1 924 6 3 4 6 0 0 0 4 testservers 2 testserver16 10.0.19.12 2 0 1 1 924 6 3 4 6 0 0 0 4 testservers 3 testserver17 169.254.0.9 0 5 1 1 924 1 0 0 14 0 0 0 4 testservers 4 testserver20 10.0.19.17 0 *4* 1 1 454 6 3 4 6 0 0 0 The state *4* above for testserver20 causes the segfault, and only occurs when slowstart is set. The configuration check can reproduce it ie: haproxy -c -f haproxy.cfg The backtrace : (gdb) bt #0 task_schedule (when=-508447097, task=0x0) at include/proto/task.h:244 #1 srv_clr_admin_flag (mode=SRV_ADMF_FMAINT, s=0x1fb0fd0) at src/server.c:626 #2 srv_adm_set_ready (s=0x1fb0fd0) at include/proto/server.h:231 #3 srv_update_state (params=0x7ffe4f15e7d0, version=1, srv=0x1fb0fd0) at src/server.c:2289 #4 apply_server_state () at src/server.c:2664 #5 0x0044b60f in init (argc=, argc@entry=4, argv=, argv@entry=0x7ffe4f160d38) at src/haproxy.c:975 #6 0x004491be in main (argc=4, argv=0x7ffe4f160d38) at src/haproxy.c:1795 The way we use the state file is to have servers with `disabled` option in the configuration; and during scaling update the backend address and mark as active using the socket. The 169.254.0.9 address is a dummy address for the disabled servers. Can someone take a look? I couldn't find any related bugs fixed in 1.8. Thanks -- Raghu