High load average under 1.8 with multiple draining processes

2018-01-11 Thread Samuel Reed
We've recently upgraded to HAProxy 1.8.3, which we run with `nbthread 4`
(we used to run nbproc 4 with older releases). This has generally been
good, especially for stick tables & stats.

We terminate SSL and proxy a large number of long-running TCP
connections (websockets). When configuration changes (usually a server
going up or down), a reload occurs. For many hours, there may be 2-5
active HAProxy instances as the old ones drain. We use hard-stop-after
to keep it reasonable.

I have noticed that load average gets *very* high while these processes
are still present. Our web tier is usually at a load average of 5 with
16 cores. Across the board, load averages go up if stale HAProxy
instances are active. I saw as high as 34 with one instance that had 5
instances, and 100% CPU, most of it sys. Even with just 2, the loadavg
is double what it is with 1. Terminating the old process immediately
brings the load down.

Is there a regression in the 1.8 series with SO_REUSEPORT and nbthread
(we didn't see this before with nbproc) or somewhere we should start
looking? We make (relatively) heavy use of stick tables for DDoS
protection purposes and terminate SSL, but aside from that our
configuration is pretty vanilla. Nothing changed from 1.7 to 1.8 except
changing nbproc to nbthread.

Thanks!




Re: cannot bind socket - Need help with config file

2018-01-11 Thread Lukas Tribus
Hello,


On 11 January 2018 at 16:36, Jonathan Matthews  wrote:
> On 11 January 2018 at 00:03, Imam Toufique  wrote:
>> So, I have everything in the listen section commented out:
>>
>> frontend main
>>bind :2200
>>default_backend sftp
>>timeout client 5d
>>
>>
>> #listen stats
>> #   bind *:2200
>> #   mode tcp
>> #   maxconn 2000
>> #   option redis-check
>> #   retries 3
>> #   option redispatch
>> #   balance roundrobin
>>
>> #use_backend sftp_server
>> backend sftp
>> balance roundrobin
>> server web 10.0.15.21:2200 check weight 2
>> server nagios 10.0.15.15:2200 check weight 2
>>
>> Is that what I need, right?
>
> I suspect you won't need to have your *backend*'s ports changed to
> 2200. Your SSH server on those machines is *probably* also your SFTP
> server

That's exactly right, your backend destination port should probably
22, there is no need to bump that one to 2200.



> As an aside, it's not clear why you're trying to do this. You've
> already hit the host-key-changing problem, and unless you have a
> *very* specific use case, your users will hit the "50% of the time I
> connect, my files have gone away" problem soon. So you've probably got
> to solve the shared-storage problem on your backends ... which turns
> them in to stateless SFTP-to-FS servers.
>
> In my opinion adding haproxy as a TCP proxy in your architecture adds
> very little, if anything. If I were you, I'd strongly consider just
> sync'ing the same host key to each server, putting their IPs in a
> low-TTL DNS record, and leaving haproxy out of the setup.

With DNS round-robin instead of haproxy you have the same exact
requirements regarding SSH keys and filesystem synchronization, with
all the disadvantages (no health checks, no direct control of the
actual load-balancing, no stats, no logs, etc).

I'm really not sure why you'd recommend DNS RR instead of haproxy
here. Load-balancing a single-port TCP protocol between 2 backends is
a bread and butter use-case for haproxy.



Regards,
Lukas



Re: [PATCH]: hathread / atomic code for clang too

2018-01-11 Thread Willy TARREAU
On Thu, Jan 11, 2018 at 04:39:54PM +, David CARLIER wrote:
> :-) I forgot to mention can be backported to 1.8 as well. Cheers !

I already did, just forgot to push.

Thanks,
Willy



Re: [PATCH]: hathread / atomic code for clang too

2018-01-11 Thread David CARLIER
:-) I forgot to mention can be backported to 1.8 as well. Cheers !

On 11 January 2018 at 14:30, Willy TARREAU  wrote:

> On Thu, Jan 11, 2018 at 02:26:06PM +, David CARLIER wrote:
> > Hi here a tiny fix proposal for the previous commit.
>
> Hi David. That's funny, I wondered if Clang would advertise itself as gcc,
> possibly
> an old one, just to annoy us, and it seems it does :-)
>
> Thanks for the fix!
>
> Willy
>


Re: cannot bind socket - Need help with config file

2018-01-11 Thread Jonathan Matthews
On 11 January 2018 at 00:03, Imam Toufique  wrote:
> So, I have everything in the listen section commented out:
>
> frontend main
>bind :2200
>default_backend sftp
>timeout client 5d
>
>
> #listen stats
> #   bind *:2200
> #   mode tcp
> #   maxconn 2000
> #   option redis-check
> #   retries 3
> #   option redispatch
> #   balance roundrobin
>
> #use_backend sftp_server
> backend sftp
> balance roundrobin
> server web 10.0.15.21:2200 check weight 2
> server nagios 10.0.15.15:2200 check weight 2
>
> Is that what I need, right?

I suspect you won't need to have your *backend*'s ports changed to
2200. Your SSH server on those machines is *probably* also your SFTP
server. I don't recall if you can serve a different/sync'd host key
per port in sshd, but this might be a reason to run a different daemon
on a higher port as you're doing.

As an aside, it's not clear why you're trying to do this. You've
already hit the host-key-changing problem, and unless you have a
*very* specific use case, your users will hit the "50% of the time I
connect, my files have gone away" problem soon. So you've probably got
to solve the shared-storage problem on your backends ... which turns
them in to stateless SFTP-to-FS servers.

In my opinion adding haproxy as a TCP proxy in your architecture adds
very little, if anything. If I were you, I'd strongly consider just
sync'ing the same host key to each server, putting their IPs in a
low-TTL DNS record, and leaving haproxy out of the setup.

J



Re: [PATCH]: hathread / atomic code for clang too

2018-01-11 Thread Willy TARREAU
On Thu, Jan 11, 2018 at 02:26:06PM +, David CARLIER wrote:
> Hi here a tiny fix proposal for the previous commit.

Hi David. That's funny, I wondered if Clang would advertise itself as gcc, 
possibly
an old one, just to annoy us, and it seems it does :-)

Thanks for the fix!

Willy



[PATCH]: hathread / atomic code for clang too

2018-01-11 Thread David CARLIER
Hi here a tiny fix proposal for the previous commit.

Hope it is good.

Kind regards.
From c1f299b45e56c77fc51b2e773272195ddaee46a7 Mon Sep 17 00:00:00 2001
From: David Carlier 
Date: Thu, 11 Jan 2018 14:20:43 +
Subject: [PATCH] BUILD/MINOR: ancient gcc versions atomic fix

Commit 1a69af6d3892fe1946bb8babb3044d2d26afd46e introduced code
for atomic prior to 4.7. Unfortunately clang uses as well those
constants which is misleading.
---
 include/common/hathreads.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/common/hathreads.h b/include/common/hathreads.h
index 503abbec..5f0b9695 100644
--- a/include/common/hathreads.h
+++ b/include/common/hathreads.h
@@ -100,7 +100,7 @@ extern THREAD_LOCAL unsigned long tid_bit; /* The bit corresponding to the threa
 /* TODO: thread: For now, we rely on GCC builtins but it could be a good idea to
  * have a header file regrouping all functions dealing with threads. */
 
-#if defined(__GNUC__) && (__GNUC__ < 4 || __GNUC__ == 4 && __GNUC_MINOR__ < 7)
+#if defined(__GNUC__) && (__GNUC__ < 4 || __GNUC__ == 4 && __GNUC_MINOR__ < 7) && !defined(__clang__)
 /* gcc < 4.7 */
 
 #define HA_ATOMIC_ADD(val, i)__sync_add_and_fetch(val, i)
-- 
2.15.1



Re: Print header lua script

2018-01-11 Thread Aleksandar Lazic

Hi.

I have written a blog post how to use this lua scirpt in a dockerized 
haproxy.


https://www.me2digital.com/blog/2018/01/show-headers-in-haproxy/

I appreciate any feedback ;-)

Best regards
Aleks

-- Originalnachricht --
Von: "Aleksandar Lazic" 
An: haproxy@formilux.org
Gesendet: 10.01.2018 23:33:01
Betreff: Print header lua script


Hi.

I have the need to print the request headers which reaches the haproxy.

Today I was sitting down and started to write this small script.

Due to the fact that this is my first lua & haproxy script I'm sure 
it's not the most efficient version ;-)


I appreciate any feedback ;-)

### print_headers.lua
core.register_action("print-headers",{ "http-req" }, 
function(transaction)

   --[[
   transaction is of class TXN.
   TXN contains contains a property 'http' which is an instance
   of HAProxy HTTP class
   ]]

   local hdr = transaction.http:req_get_headers()

   for key,value in pairs(hdr) do
 local mystr = key
 mystr = mystr .. ": "
 for _ ,val2 in pairs(value) do
   mystr = mystr .. val2
 end
 core.Info(mystr)
   end

end)

###

### haproxy.cfg
global
 log /dev/log local1 debug
 lua-load /etc/haproxy/print_headers.lua

defaults
 log global
 mode http

listen proxy001
 
 http-request lua.print-headers
 
###

One issue is that I have every entry twice in the log, I assume this 
could be because I defined the loglevel "debug" and the 'core.Info' is 
in info level


###
Jan 10 23:22:58 app001 haproxy[26681]: accept-encoding: deflate, gzip
Jan 10 23:22:58 app001 haproxy[26681]: accept-encoding: deflate, gzip
Jan 10 23:22:58 app001 haproxy[26681]: content-type: application/json
Jan 10 23:22:58 app001 haproxy[26681]: content-type: application/json
Jan 10 23:22:58 app001 haproxy[26681]: user-agent: curl/7.47.0
Jan 10 23:22:58 app001 haproxy[26681]: user-agent: curl/7.47.0
Jan 10 23:22:58 app001 haproxy[26681]: x-request-id: 3
Jan 10 23:22:58 app001 haproxy[26681]: x-request-id: 3
Jan 10 23:22:58 app001 haproxy[26681]: content-length: 63
Jan 10 23:22:58 app001 haproxy[26681]: content-length: 63
Jan 10 23:22:58 app001 haproxy[26681]: host: MY_HOST:1234
Jan 10 23:22:58 app001 haproxy[26681]: host: MY_HOST:1234
Jan 10 23:22:58 app001 haproxy[26681]: accept: */*
Jan 10 23:22:58 app001 haproxy[26681]: accept: */*
###

Thanks for feedback.

aleks







Re: How to parse custom PROXY protocol v2 header for custom routing in HAProxy configuration?

2018-01-11 Thread Aleksandar Lazic

Hi.

-- Originalnachricht --
Von: "Adam Sherwood" 
An: haproxy@formilux.org
Gesendet: 10.01.2018 23:40:25
Betreff: How to parse custom PROXY protocol v2 header for custom routing 
in HAProxy configuration?


I have written this up as a StackOverflow question here: 
https://stackoverflow.com/q/48195311/2081835.


When adding PROXY v2 with AWS VPC PrivateLink connected to a Network 
Load Balancer, the endpoint ID of the connecting account is added as a 
TLV. I need to use this for routing frontend to backend, but I cannot 
sort out how.


Is there a way to call a custom matcher that could do the parsing 
logic, or is this already built-in and I'm just not finding the 
documentation?


Any ideas on the topic would be super helpful. Thank you.
Looks like AWS use the "2.2.7. Reserved type ranges" as described in 
https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt therefore 
you will need to parse this part by your own.


This could be possible in lua, maybe I'm not an expert in lua, yet ;-)

There are javexamples in the doc link ( 
https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html#proxy-protocol 
) which you have added int the stackoverflow request.


Regards
Aleks




Segfault on haproxy 1.7.10 with state file and slowstart

2018-01-11 Thread Raghu Udiyar
Hello,

Haproxy 1.7.10 segfaults when the srv_admin_state is set to
SRV_ADMF_CMAINT (0x04)
for a backend server, and that backend has the `slowstart` option set.

The following configuration reproduces it :

-
# haproxy.cfg (replace  below)

global
maxconn 3
user haproxy
group haproxy
server-state-file //servers.state

log-tag haproxy
nbproc 1
cpu-map 1 2
stats socket /run/haproxy.sock level admin
stats socket /run/haproxy_op.sock mode 666 level operator

defaults
mode http
option forwardfor

option dontlognull
option httplog
log 127.0.0.1 local1 debug

timeout connect 5s
timeout client 50s
timeout server 50s
timeout http-request 8s

load-server-state-from-file global
listen admin
bind *:9002
stats enable
stats auth haproxyadmin:xxx

frontend testserver
bind *:9000
option tcp-smart-accept
option splice-request
option splice-response
default_backend testservers

backend testservers
balance roundrobin
option tcp-smart-connect
option splice-request
option splice-response
timeout server 2s
timeout queue 2s
default-server maxconn 10 *slowstart 10s* weight 1
server testserver15 10.0.19.10:9003check
server testserver16 10.0.19.12:9003check

server testserver17 169.254.0.9:9003 disabledcheck
server testserver20 169.254.0.9:9003 disabledcheck


# servers.state file

1
# be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state
srv_uweight srv_iweight srv_time_since_last_change srv_check_status
srv_check_result srv_check_health srv_check_state srv_agent_state
bk_f_forced_id srv_f_forced_id
4 testservers 1 testserver15 10.0.19.10 2 0 1 1 924 6 3 4 6 0 0 0
4 testservers 2 testserver16 10.0.19.12 2 0 1 1 924 6 3 4 6 0 0 0
4 testservers 3 testserver17 169.254.0.9 0 5 1 1 924 1 0 0 14 0 0 0
4 testservers 4 testserver20 10.0.19.17 0 *4* 1 1 454 6 3 4 6 0 0 0



The state *4* above for testserver20 causes the segfault, and only occurs
when slowstart is set.

The configuration check can reproduce it ie: haproxy -c -f haproxy.cfg

The backtrace :

(gdb) bt
#0  task_schedule (when=-508447097, task=0x0) at include/proto/task.h:244
#1  srv_clr_admin_flag (mode=SRV_ADMF_FMAINT, s=0x1fb0fd0) at
src/server.c:626
#2  srv_adm_set_ready (s=0x1fb0fd0) at include/proto/server.h:231
#3  srv_update_state (params=0x7ffe4f15e7d0, version=1, srv=0x1fb0fd0) at
src/server.c:2289
#4  apply_server_state () at src/server.c:2664
#5  0x0044b60f in init (argc=, argc@entry=4,
argv=,
argv@entry=0x7ffe4f160d38) at src/haproxy.c:975
#6  0x004491be in main (argc=4, argv=0x7ffe4f160d38) at
src/haproxy.c:1795


The way we use the state file is to have servers with `disabled` option in
the configuration; and during scaling update the backend address and mark
as active using the socket. The 169.254.0.9 address is a dummy address for
the disabled servers.

Can someone take a look? I couldn't find any related bugs fixed in 1.8.

Thanks
-- Raghu