Re: stats / "show servers conn" looses counter after reload

2021-02-12 Thread Christian Ruppert

On 2021-02-12 12:06, William Dauchy wrote:

Hi Christian,

On Fri, Feb 12, 2021 at 11:59 AM Christian Ruppert  
wrote:

Is this a bug? Can you confirm this behavior? Is there any other way I
could figure out whether a backend is currently in use?


unfortunately reload does not recover stats values; it is a known
problem; see also https://github.com/haproxy/haproxy/issues/954


Thanks, William!

I just commented on that issue.

--
Regards,
Christian Ruppert



stats / "show servers conn" looses counter after reload

2021-02-12 Thread Christian Ruppert

Hi list,

I'm not sure if that is intended, to me it looks like a bug.
I was trying to figure out if a backend is in use or not so I was 
looking for the used_cur:
echo "show servers conn somebackend_rtmp" | socat stdio 
/var/run/haproxy.stat
# bkname/svname bkid/svid addr port - purge_delay used_cur used_max 
need_est unsafe_nb safe_nb idle_lim idle_cur idle_per_thr[48]
somebackend_rtmp/localhost 615/1 127.0.0.1 50643 - 5000 0 0 0 0 0 -1 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0


I then noticed, that those values doesn't match with what tcpdump says.
I tracked it down to be caused by a reload.
A testcase:
Create a large file with dd or throttle your browsers connection to like 
modem or something to just keep the connection open

Do a "show servers conn" and make sure used_cur is > 0
Reload HAProxy
Notice it's 0 even though your download / session continues

I tested it with tcp as well as http mode.
I also have "expose-fd listeners" in use, in case it matters.
Affected are at least 2.2.9 as well as 2.3.5.
The stats backend also looses its counter values.

Is this a bug? Can you confirm this behavior? Is there any other way I 
could figure out whether a backend is currently in use?


--
Regards,
Christian Ruppert



Re: Issues with d13afbcce5e664f9cfe797eee8c527e5fa947f1b (haproxy-2.2) "mux-h1: Don't set CS_FL_EOI too early for protocol upgrade requests"

2021-02-10 Thread Christian Ruppert

On 2021-02-10 18:15, Christopher Faulet wrote:

Le 08/02/2021 à 14:31, Christian Ruppert a écrit :

Hi list, Christopher,

we're having issues with the mentioned commit / patch:
d13afbcce5e664f9cfe797eee8c527e5fa947f1b
https://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=d13afbcce5e664f9cfe797eee8c527e5fa947f1b

I can also reproduce it with 2.2.9 as well as 2.3.5. I don't have any
useful details yet, just the our Jira fails to load.
A curl against the site seams to work fine while browser requests
(chrome / firefox) seem to timeout or at least some.

See the attached log. The first 3 requests seem to be fine so far. 
Then,

much later, there's a 504 between more 200s.
I'm not sure yet why the other 200s there seem to wait / are logged
after the actual timeout happens. According to chrome's F12 there are
more requests still pending.
Ignore the 503 there. That seems to be an unrelated problem, since 
this

also happends with a working HAProxy.

Much later, the site loaded, sometimes broken though.

I'll try to prepare a config snipped if required.

Is there anything know already?



Hi,

Thanks to information that Christian provided me offlist, I've finally
found and fixed the bug. The corresponding commit is :

commit a22782b597ee9a3bfecb18a66e29633c8e814216
Author: Christopher Faulet 
Date:   Mon Feb 8 17:18:01 2021 +0100

BUG/MEDIUM: mux-h1: Always set CS_FL_EOI for response in MSG_DONE 
state


During the message parsing, if in MSG_DONE state, the CS_FL_EOI 
flag must

always be set on the conn-stream if following conditions are met :

  * It is a response or
  * It is a request but not a protocol upgrade nor a CONNECT.

For now, there is no test on the message type (request or 
response). Thus
the CS_FL_EOI flag is not set for a response with a "Connection: 
upgrade"

header but not a 101 response.

This bug was introduced by the commit 3e1748bbf ("BUG/MINOR: 
mux-h1: Don't
set CS_FL_EOI too early for protocol upgrade requests"). It was 
backported
as far as 2.0. Thus, this patch must also be backported as far as 
2.0.


However, it is not backported yet. Thanks Christian !


Thanks for the very fast patching, Christopher! I've rolled out the new 
version on some more production machines and I haven't noticed or heard 
of any issues yet. Tomorrow I'll roll it out to the rest of our LBs.


--
Regards,
Christian Ruppert



Re: Issues with d13afbcce5e664f9cfe797eee8c527e5fa947f1b (haproxy-2.2) "mux-h1: Don't set CS_FL_EOI too early for protocol upgrade requests"

2021-02-08 Thread Christian Ruppert

On 2021-02-08 14:46, Christopher Faulet wrote:

Le 08/02/2021 à 14:31, Christian Ruppert a écrit :

Hi list, Christopher,

we're having issues with the mentioned commit / patch:
d13afbcce5e664f9cfe797eee8c527e5fa947f1b
https://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=d13afbcce5e664f9cfe797eee8c527e5fa947f1b

I can also reproduce it with 2.2.9 as well as 2.3.5. I don't have any
useful details yet, just the our Jira fails to load.
A curl against the site seams to work fine while browser requests
(chrome / firefox) seem to timeout or at least some.

See the attached log. The first 3 requests seem to be fine so far. 
Then,

much later, there's a 504 between more 200s.
I'm not sure yet why the other 200s there seem to wait / are logged
after the actual timeout happens. According to chrome's F12 there are
more requests still pending.
Ignore the 503 there. That seems to be an unrelated problem, since 
this

also happends with a working HAProxy.

Much later, the site loaded, sometimes broken though.

I'll try to prepare a config snipped if required.

Is there anything know already?



Thanks Christian,  I'll take a look. Could you confirm or inform it
happens only with requests with a "Connection: upgrade" header ?


This frontend doesn't have H2 enabled explicit. I'm not really sure but 
it looks like some of those delayed requests don't have the upgrade 
header.


--
Regards,
Christian Ruppert



Issues with d13afbcce5e664f9cfe797eee8c527e5fa947f1b (haproxy-2.2) "mux-h1: Don't set CS_FL_EOI too early for protocol upgrade requests"

2021-02-08 Thread Christian Ruppert

Hi list, Christopher,

we're having issues with the mentioned commit / patch:
d13afbcce5e664f9cfe797eee8c527e5fa947f1b
https://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=d13afbcce5e664f9cfe797eee8c527e5fa947f1b

I can also reproduce it with 2.2.9 as well as 2.3.5. I don't have any 
useful details yet, just the our Jira fails to load.
A curl against the site seams to work fine while browser requests 
(chrome / firefox) seem to timeout or at least some.


See the attached log. The first 3 requests seem to be fine so far. Then, 
much later, there's a 504 between more 200s.
I'm not sure yet why the other 200s there seem to wait / are logged 
after the actual timeout happens. According to chrome's F12 there are 
more requests still pending.
Ignore the 503 there. That seems to be an unrelated problem, since this 
also happends with a working HAProxy.


Much later, the site loaded, sometimes broken though.

I'll try to prepare a config snipped if required.

Is there anything know already?

--
Regards,
Christian Ruppert1.2.3.4:48262 [08/Feb/2021:14:07:46.764] genfrontend_23510-somecorp_jira_prod~ 
genbackend_23540-somecorp_jira_prod/localhost 0/0/0/42/42 200 11980 - -  
2/1/0/0/0 0/0 {jira.somecorp.com|Mozilla/5.0 (X11; Linux x86_64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36|||} 
"GET /secure/Dashboard.jspa HTTP/1.1"
1.2.3.4:48274 [08/Feb/2021:14:07:47.012] genfrontend_23510-somecorp_jira_prod~ 
genbackend_23540-somecorp_jira_prod/localhost 0/0/0/8/8 200 732 - -  
11/6/4/4/0 0/0 {jira.somecorp.com|Mozilla/5.0 (X11; Linux x86_64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 
Safari/537.36|https://jira.somecorp.com/secure/Dashboard.jspa||} "GET 
/s/d41d8cd98f00b204e9800998ecf8427e-CDN/-98lwuj/813002/490f9a4ca6ac70d1532e1a0dd1cb197c/3.0.4/_/download/batch/com.atlassian.jira.jira-tzdetect-plugin:tzdetect-banner-component/com.atlassian.jira.jira-tzdetect-plugin:tzdetect-banner-component.css
 HTTP/1.1"
1.2.3.4:48278 [08/Feb/2021:14:07:47.012] genfrontend_23510-somecorp_jira_prod~ 
genbackend_23540-somecorp_jira_prod/localhost 0/0/1/7/8 200 2594 - -  
11/6/3/3/0 0/0 {jira.somecorp.com|Mozilla/5.0 (X11; Linux x86_64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 
Safari/537.36|https://jira.somecorp.com/secure/Dashboard.jspa||} "GET 
/s/ff9c3ef8b3ac69e6c33e26ebff0feeac-CDN/-98lwuj/813002/490f9a4ca6ac70d1532e1a0dd1cb197c/14305ce2982b2bea8ec24ee5b182b6c7/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css
 HTTP/1.1"






1.2.3.4:48278 [08/Feb/2021:14:07:47.042] genfrontend_23510-somecorp_jira_prod~ 
genbackend_23540-somecorp_jira_prod/localhost 0/0/0/24/24 200 3460 - - 
 12/6/5/5/0 0/0 {jira.somecorp.com|Mozilla/5.0 (X11; Linux x86_64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 
Safari/537.36|https://jira.somecorp.com/secure/Dashboard.jspa||} "GET 
/s/00712794fa9ae1af7f4bae6d811706f6-CDN/-98lwuj/813002/490f9a4ca6ac70d1532e1a0dd1cb197c/3.0.4/_/download/batch/com.atlassian.jira.jira-tzdetect-plugin:tzdetect-banner-component/com.atlassian.jira.jira-tzdetect-plugin:tzdetect-banner-component.js?locale=de-DE
 HTTP/1.1"
1.2.3.4:48274 [08/Feb/2021:14:07:47.036] genfrontend_23510-somecorp_jira_prod~ 
genbackend_23540-somecorp_jira_prod/localhost 0/0/0/-1/30 504 203 - - sH-- 
12/6/4/4/0 0/0 {jira.somecorp.com|Mozilla/5.0 (X11; Linux x86_64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 
Safari/537.36|https://jira.somecorp.com/secure/Dashboard.jspa||} "GET 
/s/9a5257d0fa632d5edfa0967de8b1c7df-CDN/-98lwuj/813002/490f9a4ca6ac70d1532e1a0dd1cb197c/6aeee818fc5e706562782156532c027f/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=de-DE
 HTTP/1.1"
1.2.3.4:48276 [08/Feb/2021:14:07:47.012] genfrontend_23510-somecorp_jira_prod~ 
genbackend_23540-somecorp_jira_prod/localhost 0/0/1/13/300035 200 144662 - - 
sD-- 11/6/3/3/0 0/0 {jira.somecorp.com|Mozilla/5.0 (X11; Linux x86_64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 
Safari/537.36|https://jira.somecorp.com/secure/Dashboard.jspa||} "GET 
/s/fc667f1dddeeda81ff75f751da43391f-CDN/-98lwuj/813002/490f9a4ca6ac70d1532e1a0dd1cb197c/c2b5a025bbb84cbbb3ec1b499bc08403/_/download/contextbatch/css/atl.dashboard,jira.global,atl.general,jira.dashboard,-_super/batch.css?agile_global_admin_condition=true=true
 HTTP/1.1"
1.2.3.4:48276 [08/Feb/2021:14:12:47.046] genfrontend_23510-somecorp_jira_prod~ 
genbackend_23540-somecorp_jira_prod/localhost 0/0/0/7/7 200 861 - -  
11/6/3/3/0 0/0 {jira.somecorp.com|Mozilla/5.0 (X11; Linux x86_64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 
Safari/537.36|https://jira.somecorp.com/secure/Dashboard.jspa||} "GET 
/s/d41d8cd98f00b204e9800998ecf8427e-CDN/-98lwuj/813002/490f9a4ca6ac70d1532e1a0dd1cb197c/3.0.4/_/download/batch/com.atlassian.jira.jira-tzdetect-plugin:tzdetect-lib/com.atlassian.jira.jira-tzdetect-plugin:tzdetect-lib.js
 HTTP/1.1"
1.2.3.4:48274 [08/Feb/2021:14:12:47.058] 

Re: [PR] hpack-tbl-t.h uses VAR_ARRAY and requires compiler.h to be included

2020-12-22 Thread Christian Ruppert

No Problem at all. Feel free :)

On 2020-12-21 12:59, Willy Tarreau wrote:

On Mon, Dec 21, 2020 at 12:20:36PM +0100, Christian Ruppert wrote:

>   2) we include  from this file so that it becomes
> consistent
>  with everything else ;
>
>   3) we add the ifdef VAR_ARRAY directly into the file so that it
> continues
>  not to depend on anything and can be directly imported into other
>  projects as needed.
>
> I guess I prefer the 3rd option here as it's extremely cheap and will
> keep external build setups very straightforward. What do you think ?
>
> Thanks!
> Willy

2. and 3. sounds good. 3. however seems to be the best solution, 
indeed.


OK, do you mind if I just modify your patch and commit message 
according

to this ? Or do you prefer to send a new one ? I'm asking because while
I usually have no problem modifying patches or commit messages, I don't
do it when they're signed.

Thanks,
Willy


--
Regards,
Christian Ruppert



Re: [PR] hpack-tbl-t.h uses VAR_ARRAY and requires compiler.h to be included

2020-12-21 Thread Christian Ruppert

Hey Willy,

On 2020-12-21 11:36, Willy Tarreau wrote:

Hi,

On Sun, Dec 20, 2020 at 12:58:52PM +0500,  ??? wrote:

ping :)


Oh I completely missed this one in the noise it seems! I'm sorry.


No problem! :)




> Author: Christian Ruppert 
> Number of patches: 1
>
> This is an automated relay of the Github pull request:
>hpack-tbl-t.h uses VAR_ARRAY and requires compiler.h to be included


I initially tried hard not to put haproxy-specific dependencies in 
these

protocol-specific parts so that they could easily be reused by other
projects if needed (hence the relaxed MIT license). But I guess adding
compiler.h is not that big of a deal. However I disagree with including
it from the same directory with double-quotes, as we try to keep our
includes more or less ordered with certain dependencies.

Thus Christian, I can offer 3 possibilities here, I don't know which
one best suits your use case:

  1) we include  from this file. It will best 
follow
 the current practices all over the code, but may or may not work 
for

 your use case depending how you include the file;

  2) we include  from this file so that it becomes 
consistent

 with everything else ;

  3) we add the ifdef VAR_ARRAY directly into the file so that it 
continues

 not to depend on anything and can be directly imported into other
 projects as needed.

I guess I prefer the 3rd option here as it's extremely cheap and will
keep external build setups very straightforward. What do you think ?

Thanks!
Willy


2. and 3. sounds good. 3. however seems to be the best solution, indeed.

--
Regards,
Christian Ruppert



Storing src + backend or frontend name in stick-table

2020-07-16 Thread Christian Ruppert

Hi List,

is it possible to store both, IP (src) and the frontend and/or backend 
name in a stick table? We use the IP in some frontends, the 
frontend/backend name is only for visibility/informational purpose.
We have pretty huge configs with several hundred frontends/backends and 
we'd like to know like where a bot triggered some action and stuff like 
that.


--
Regards,
Christian Ruppert



Re: HTTP/2 in 2.1.x behaves different than in 2.0.x

2020-07-06 Thread Christian Ruppert

Hi Jerome, Willy,

thanks! Yeah, it only affected url, not path. I've fixed all cases were 
we wrongly assumed that url is like path.

Thanks for clarifying!

On 2020-07-03 19:59, Willy Tarreau wrote:

On Fri, Jul 03, 2020 at 02:25:33PM +0200, Jerome Magnin wrote:

Hi Christian,

On Fri, Jul 03, 2020 at 11:02:48AM +0200, Christian Ruppert wrote:
> Hi List,
>
> we've just noticed and confirmed some strange change in behavior, depending
> on whether the request is made with HTTP 1.x or 2.x.
> [...]
> That also affects ACLs like url*/path* and probably others.
> I don't think that is intended, isn't it?
> That looks like a regression to me. If that is a bug/regression, than it
> might be good if it's possible to catch that one via test case (regtest).
>

This change is intentional and not a regression, it was introduced by
this commit:
http://git.haproxy.org/?p=haproxy.git;a=commit;h=30ee1efe676e8264af16bab833c621d60a72a4d7


Yep, it's the only way not to break end-to-end transmission, which is
even harder when H1 is used first and H2 behind.

Also please note that "path" is *not* broken because it's already taken
from the right place. "url" will see changes when comparing with the
previous version which would see a path in H2, or either a path or a 
uri
in H1. Because if you're using "url", in H1 you can already have the 
two

forms.

Now what haproxy does is to preserve each URL component intact. If you
change the scheme it only changes it. If you call "set-path" it will 
only
change the path, if you use "replace-uri" it will replace the whole 
uri.


I'd say that HTTP/2 with the :authority header was made very 
browser-centric
and went back to the origins of the URIs. It's certain that for all of 
us
working more on the server side it looks unusual but for those on the 
client
side it's more natural. Regardless, what it does was already supported 
by

HTTP/1 agents and even used to communicate with proxies, so it's not a
fundamental breakage, it just emphasizes something that people were not
often thinking about.

Hoping this helps,
Willy


--
Regards,
Christian Ruppert



HTTP/2 in 2.1.x behaves different than in 2.0.x

2020-07-03 Thread Christian Ruppert

Hi List,

we've just noticed and confirmed some strange change in behavior, 
depending on whether the request is made with HTTP 1.x or 2.x.

Steps to reproduce:
HAProxy 2.1.x
A simple http frontend, including h2 + logging

tail -f /var/log/haproxy.log|grep curl

curl -s https://example.com -o /dev/null --http1.1
curl -s https://example.com -o /dev/null --http2

Notice the difference:
test_https~ backend_test/testsrv1 1/0/0/2/3 200 4075 - -  1/1/0/0/0 
0/0 {example.com|curl/7.69.1|} "GET / HTTP/1.1"
test_https~ backend_test/testsrv1 0/0/0/3/3 200 4075 - -  1/1/0/0/0 
0/0 {example.com|curl/7.69.1|} "GET https://example.com/ HTTP/2.0"


Now the same with HAProxy 2.0.14:
test_https~ backend_test/testsrv1 1/0/0/2/3 200 4075 - -  1/1/0/0/0 
0/0 {example.com|curl/7.69.1|} "GET / HTTP/1.1"
test_https~ backend_test/testsrv1 0/0/0/3/3 200 4075 - -  1/1/0/0/0 
0/0 {example.com|curl/7.69.1|} "GET / HTTP/2.0"


That also affects ACLs like url*/path* and probably others.
I don't think that is intended, isn't it?
That looks like a regression to me. If that is a bug/regression, than it 
might be good if it's possible to catch that one via test case 
(regtest).


--
Regards,
Christian Ruppert



Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-30 Thread Christian Ruppert

On 2020-03-27 16:58, Christian Ruppert wrote:

On 2020-03-27 16:49, Olivier Houchard wrote:

On Fri, Mar 27, 2020 at 04:32:21PM +0100, Christian Ruppert wrote:

On 2020-03-27 16:27, Olivier Houchard wrote:
> On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote:
>> During the reload I just found something in the daemon log:
>> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
>> Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540]
>> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
>> Starting proxy someotherlistener: cannot bind socket [:::18540]
>>
>> So during the reload, this happened and seems to have caused any
>> further
>> issues/trouble.
>>
>
> That would make sense. Does that mean you have old processes hanging
> around ? Do you use seemless reload ? If so, it shouldn't attempt to
> bind the socket, but get them from the old process.

I remember that it was necessary to have a systemd wrapper around, as 
it

caused trouble otherwise, due to PID being changed etc.
Not sure if that wrapper is still in use. In this case it's systemd
though.
[Unit]
Description=HAProxy Load Balancer
After=network.target

[Service]
Environment="CONFIG=/etc/haproxy/haproxy.cfg" 
"PIDFILE=/run/haproxy.pid"

ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE
ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
ExecReload=/bin/kill -USR2 $MAINPID
KillMode=mixed
Restart=always
SuccessExitStatus=143
TimeoutStopSec=30
Type=notify


[...]


We've added the TimeoutStopSec=30 for some reason (I'd have to ask my
college, something took longer or something like that, since we have
quite a lot of frontends/listener/backend)
Only the two processes I mentioned before are / were running. Seems 
like

the fallback didn't work properly?



The wrapper is no longer needed, it has been superceeded by the
master-worker (which you seem to be using, given you're using -Ws).
It is possible the old process refuse to die, and you end up hitting 
the

timeout and it gets killed eventually, but it's too late.
Do you have a expose-fd listeners on the unix stats socket ? Using it
will allow the new process to connect to the old process' stats 
socket,

and get all the listening sockets, so that it won't have to bind them.



Oh, that sounds quite handy. I wasn't aware of it. I'll add it
soonish. Thanks for the hint!


https://www.haproxy.com/de/blog/hitless-reloads-with-haproxy-howto/
"Please note that this step does not need to be performed if your 
HAProxy configuration already contains the directive “master-worker”, or 
if it is started with the option -W."


I have steps to reproduce it:
A C sample to bind the socket (nc doesn't work for some reason):
#include 
#include 
#include 
#include 

int main() {
int sock;
struct sockaddr_in server;

sock = socket(AF_INET , SOCK_STREAM , 0);
if (sock == -1) {
printf("Failed to create socket!\n");
}

server.sin_family = AF_INET;
server.sin_addr.s_addr = INADDR_ANY;
server.sin_port = htons(1338);

if( bind(sock,(struct sockaddr *)  , sizeof(server)) == -1) {
printf("Failed to bind socket!\n");
}

while(1) {
sleep(1);
}

return 0;
}

gcc socket.c -o socket
./socket

Having a initial HAProxy config:
global
user haproxy
group haproxy

log-send-hostname

log 127.0.0.1 len 65535 local0

   stats socket unix@/run/haproxy.stat user haproxy gid haproxy mode 
600 level admin



frontend unixsocket_reload
bind 127.0.0.1:1337
bind unix@/run/haproxy-sockettest.sock user haproxy group root mode 600
mode http
log global


And starting it, with sytemd, ending up in:
/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid

Testing:
curl --unix-socket /run/haproxy-sockettest.sock http://127.0.0.1 -vs
echo help | socat unix-connect:/run/haproxy.stat stdio

Adding a second frontend to the haproxy.cfg:
frontend unixsocket_reload2
bind 127.0.0.1:1338
	bind unix@/run/haproxy-sockettest-2.sock user haproxy group root mode 
600

mode http
log global

systemctl reload haproxy

curl and socat doesn't work anymore while the TCP socket still works.

Now restarting HAProxy with the initial config but with the adjusted 
stats socket:
stats socket unix@/run/haproxy.stat user haproxy gid haproxy mode 600 
level admin expose-fd listeners


Note that the -x will be appended automatically (at least for systemd 
-Ws)


And doing the same again. curl and socat still works. The new frontend 
does not even though the UNIX socket it created.
I think the way that works is ok for me then. Thanks for pointing out 
the expose-fd listeners!





Regards,

Olivier


--
Regards,
Christian Ruppert



Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Christian Ruppert

On 2020-03-27 16:49, Olivier Houchard wrote:

On Fri, Mar 27, 2020 at 04:32:21PM +0100, Christian Ruppert wrote:

On 2020-03-27 16:27, Olivier Houchard wrote:
> On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote:
>> During the reload I just found something in the daemon log:
>> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
>> Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540]
>> Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
>> Starting proxy someotherlistener: cannot bind socket [:::18540]
>>
>> So during the reload, this happened and seems to have caused any
>> further
>> issues/trouble.
>>
>
> That would make sense. Does that mean you have old processes hanging
> around ? Do you use seemless reload ? If so, it shouldn't attempt to
> bind the socket, but get them from the old process.

I remember that it was necessary to have a systemd wrapper around, as 
it

caused trouble otherwise, due to PID being changed etc.
Not sure if that wrapper is still in use. In this case it's systemd
though.
[Unit]
Description=HAProxy Load Balancer
After=network.target

[Service]
Environment="CONFIG=/etc/haproxy/haproxy.cfg" 
"PIDFILE=/run/haproxy.pid"

ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE
ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
ExecReload=/bin/kill -USR2 $MAINPID
KillMode=mixed
Restart=always
SuccessExitStatus=143
TimeoutStopSec=30
Type=notify


[...]


We've added the TimeoutStopSec=30 for some reason (I'd have to ask my
college, something took longer or something like that, since we have
quite a lot of frontends/listener/backend)
Only the two processes I mentioned before are / were running. Seems 
like

the fallback didn't work properly?



The wrapper is no longer needed, it has been superceeded by the
master-worker (which you seem to be using, given you're using -Ws).
It is possible the old process refuse to die, and you end up hitting 
the

timeout and it gets killed eventually, but it's too late.
Do you have a expose-fd listeners on the unix stats socket ? Using it
will allow the new process to connect to the old process' stats socket,
and get all the listening sockets, so that it won't have to bind them.



Oh, that sounds quite handy. I wasn't aware of it. I'll add it soonish. 
Thanks for the hint!



Regards,

Olivier


--
Regards,
Christian Ruppert



Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Christian Ruppert

On 2020-03-27 16:27, Olivier Houchard wrote:

On Fri, Mar 27, 2020 at 04:21:20PM +0100, Christian Ruppert wrote:

During the reload I just found something in the daemon log:
Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540]
Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) :
Starting proxy someotherlistener: cannot bind socket [:::18540]

So during the reload, this happened and seems to have caused any 
further

issues/trouble.



That would make sense. Does that mean you have old processes hanging
around ? Do you use seemless reload ? If so, it shouldn't attempt to
bind the socket, but get them from the old process.


I remember that it was necessary to have a systemd wrapper around, as it 
caused trouble otherwise, due to PID being changed etc.
Not sure if that wrapper is still in use. In this case it's systemd 
though.

[Unit]
Description=HAProxy Load Balancer
After=network.target

[Service]
Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid"
ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE
ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
ExecReload=/bin/kill -USR2 $MAINPID
KillMode=mixed
Restart=always
SuccessExitStatus=143
TimeoutStopSec=30
Type=notify

# The following lines leverage SystemD's sandboxing options to provide
# defense in depth protection at the expense of restricting some 
flexibility

# in your setup (e.g. placement of your configuration files) or possibly
# reduced performance. See systemd.service(5) and systemd.exec(5) for 
further

# information.

# NoNewPrivileges=true
# ProtectHome=true
# If you want to use 'ProtectSystem=strict' you should whitelist the 
PIDFILE,

# any state files and any other files written using 'ReadWritePaths' or
# 'RuntimeDirectory'.
# ProtectSystem=true
# ProtectKernelTunables=true
# ProtectKernelModules=true
# ProtectControlGroups=true
# If your SystemD version supports them, you can add: @reboot, @swap, 
@sync

# SystemCallFilter=~@cpu-emulation @keyring @module @obsolete @raw-io

[Install]
WantedBy=multi-user.target


We've added the TimeoutStopSec=30 for some reason (I'd have to ask my 
college, something took longer or something like that, since we have 
quite a lot of frontends/listener/backend)
Only the two processes I mentioned before are / were running. Seems like 
the fallback didn't work properly?




Regards,

Olivier


--
Regards,
Christian Ruppert



Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Christian Ruppert

During the reload I just found something in the daemon log:
Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : 
Starting proxy someotherlistener: cannot bind socket [0.0.0.0:18540]
Mar 27 13:37:54 somelb haproxy[20799]: [ALERT] 086/133748 (20799) : 
Starting proxy someotherlistener: cannot bind socket [:::18540]


So during the reload, this happened and seems to have caused any further 
issues/trouble.


On 2020-03-27 15:10, Christian Ruppert wrote:

So now I looked for more of those "SC"'s in the log, from our
monitoring and it appeared first around 13:38:01.
Around 13:37:54 a reload was issued by puppet or rundeck.
So right now, it seems that something happened during the reload which
affected UNIX sockets.

On 2020-03-27 15:00, Christian Ruppert wrote:

Hi Olivier,

On 2020-03-27 14:50, Olivier Houchard wrote:

Hi Christian,

On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote:

Hi list,

we have some weird issues now, the second time, that *some* SSL 
sockets

seem to be broken as well as stats sockets.
HTTP seems to work fine, still, SSL ones are broken however. It 
happened
at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure 
whether

the first time was on 2.1.2 or 2.1.3.
The one that failed today was updated yesterday, so HAProxy has an
uptime of about 24h.
We're using threads. default + HTTP is using 1 thread, 1 is 
dedicated
for a TCP listener/Layer-4, one is for RSA only and all the rest is 
for

ECC.

[...]
The problem ocurred arount 13:40 (CET, in case it matters at some 
point)


Any ideas so far?



So basically, it used to work, and suddenly you get errors on any TLS
connection ?


Yeah, right now it looks like that way.

If you still have the TCP stat socket working, can you show the 
output

of "show fd" ?


Oh, it's the http stats listener that's still working. Not sure
whether it accepts any commands to be honest.
pid = 21313 (process #1, nbproc = 1, nbthread = 8)
uptime = 0d 1h56m48s
system limits: memmax = unlimited; ulimit-n = 1574819
maxsock = 1574819; maxconn = 786432; maxpipes = 0
current conns = 6; current pipes = 0/0; conn rate = 43/sec; bit rate =
219.704 kbps
Running tasks: 1/1158; idle = 100 %



Thanks !

Olivier


--
Regards,
Christian Ruppert



Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Christian Ruppert
So now I looked for more of those "SC"'s in the log, from our monitoring 
and it appeared first around 13:38:01.

Around 13:37:54 a reload was issued by puppet or rundeck.
So right now, it seems that something happened during the reload which 
affected UNIX sockets.


On 2020-03-27 15:00, Christian Ruppert wrote:

Hi Olivier,

On 2020-03-27 14:50, Olivier Houchard wrote:

Hi Christian,

On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote:

Hi list,

we have some weird issues now, the second time, that *some* SSL 
sockets

seem to be broken as well as stats sockets.
HTTP seems to work fine, still, SSL ones are broken however. It 
happened
at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure 
whether

the first time was on 2.1.2 or 2.1.3.
The one that failed today was updated yesterday, so HAProxy has an
uptime of about 24h.
We're using threads. default + HTTP is using 1 thread, 1 is dedicated
for a TCP listener/Layer-4, one is for RSA only and all the rest is 
for

ECC.

[...]
The problem ocurred arount 13:40 (CET, in case it matters at some 
point)


Any ideas so far?



So basically, it used to work, and suddenly you get errors on any TLS
connection ?


Yeah, right now it looks like that way.


If you still have the TCP stat socket working, can you show the output
of "show fd" ?


Oh, it's the http stats listener that's still working. Not sure
whether it accepts any commands to be honest.
pid = 21313 (process #1, nbproc = 1, nbthread = 8)
uptime = 0d 1h56m48s
system limits: memmax = unlimited; ulimit-n = 1574819
maxsock = 1574819; maxconn = 786432; maxpipes = 0
current conns = 6; current pipes = 0/0; conn rate = 43/sec; bit rate =
219.704 kbps
Running tasks: 1/1158; idle = 100 %



Thanks !

Olivier


--
Regards,
Christian Ruppert



Re: Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Christian Ruppert

Hi Olivier,

On 2020-03-27 14:50, Olivier Houchard wrote:

Hi Christian,

On Fri, Mar 27, 2020 at 02:37:41PM +0100, Christian Ruppert wrote:

Hi list,

we have some weird issues now, the second time, that *some* SSL 
sockets

seem to be broken as well as stats sockets.
HTTP seems to work fine, still, SSL ones are broken however. It 
happened
at least on 2.1.3 and *perhaps* on 2.1.2 as well. We're not sure 
whether

the first time was on 2.1.2 or 2.1.3.
The one that failed today was updated yesterday, so HAProxy has an
uptime of about 24h.
We're using threads. default + HTTP is using 1 thread, 1 is dedicated
for a TCP listener/Layer-4, one is for RSA only and all the rest is 
for

ECC.

[...]
The problem ocurred arount 13:40 (CET, in case it matters at some 
point)


Any ideas so far?



So basically, it used to work, and suddenly you get errors on any TLS
connection ?


Yeah, right now it looks like that way.


If you still have the TCP stat socket working, can you show the output
of "show fd" ?


Oh, it's the http stats listener that's still working. Not sure whether 
it accepts any commands to be honest.

pid = 21313 (process #1, nbproc = 1, nbthread = 8)
uptime = 0d 1h56m48s
system limits: memmax = unlimited; ulimit-n = 1574819
maxsock = 1574819; maxconn = 786432; maxpipes = 0
current conns = 6; current pipes = 0/0; conn rate = 43/sec; bit rate = 
219.704 kbps

Running tasks: 1/1158; idle = 100 %



Thanks !

Olivier


--
Regards,
Christian Ruppert



Weird issues with UNIX-Sockets on 2.1.x

2020-03-27 Thread Christian Ruppert
sing-field-initializers -Wtype-limits -Wshift-negative-value 
-Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_PCRE=1 USE_PCRE_JIT= USE_LIBCRYPT=1 USE_OPENSSL=1 
USE_LUA=1 USE_ZLIB=1 USE_NS= USE_SYSTEMD=1


Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER +PCRE 
-PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD 
-PTHREAD_PSHARED -REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY 
+LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO 
+OPENSSL +LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO 
-NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER 
+PRCTL +THREAD_DUMP -EVPORTS


Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=8).
Built with OpenSSL version : OpenSSL 1.1.0l  10 Sep 2019
Running on OpenSSL version : OpenSSL 1.1.0l  10 Sep 2019
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.3
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND

Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with the Prometheus exporter as a service

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as  cannot be specified using 'proto' 
keyword)

  h2 : mode=HTTP   side=FE|BE mux=H2
fcgi : mode=HTTP   side=BEmux=FCGI
: mode=HTTP   side=FE|BE mux=H1
: mode=TCPside=FE|BE mux=PASS

Available services :
prometheus-exporter

Available filters :
[SPOE] spoe
[CACHE] cache
[FCGI] fcgi-app
[TRACE] trace
    [COMP] compression

--
Regards,
Christian Ruppert



Re: master-worker no-exit-on-failure with SO_REUSEPORT and a port being already in use

2019-11-21 Thread Christian Ruppert

On 2019-11-20 11:05, William Lallemand wrote:

On Wed, Nov 20, 2019 at 10:19:20AM +0100, Christian Ruppert wrote:

Hi William,

thanks for the patch. I'll test it later today.  What I actually 
wanted to
achieve is: 
https://cbonte.github.io/haproxy-dconv/2.0/management.html#4 Then
HAProxy tries to bind to all listening ports. If some fatal errors 
happen
(eg: address not present on the system, permission denied), the 
process quits
with an error. If a socket binding fails because a port is already in 
use,
then the process will first send a SIGTTOU signal to all the pids 
specified
in the "-st" or "-sf" pid list. This is what is called the "pause" 
signal. It
instructs all existing haproxy processes to temporarily stop listening 
to
their ports so that the new process can try to bind again. During this 
time,
the old process continues to process existing connections. If the 
binding
still fails (because for example a port is shared with another 
daemon), then
the new process sends a SIGTTIN signal to the old processes to 
instruct them
to resume operations just as if nothing happened. The old processes 
will then
restart listening to the ports and continue to accept connections. Not 
that

this mechanism is system

In my test case though it failed to do so.


Well, it only works with HAProxy processes, not with other processes. 
There is
no mechanism to ask a process which is neither an haproxy process nor a 
process

which use SO_REUSEPORT.

With HAProxy processes it will bind with SO_REUSEPORT, and will only 
use the

SIGTTOU/SIGTTIN signals if it fails to do so.

This part of the documentation is for HAProxy without master-worker 
mode
in master-worker mode, once the master is launched successfully it is 
never

supposed to quit upon a reload (kill -USR2).

During a reload in master-worker mode, the master will do a -sf .
If the reload failed for any reason (bad configuration, unable to bind 
etc.),
the behavior is to keep the previous workers. It only tries to kill the 
workers

if the reload succeed. So this is the default behavior.


Your patch seems to fix the issue. The master process won't exit 
anymore. Fallback seems to work during my initial tests. Thanks!


--
Regards,
Christian Ruppert



Re: Combining (kind of) http and tcp checks

2019-11-21 Thread Christian Ruppert

Hi Aleks,

On 2019-11-21 11:01, Aleksandar Lazic wrote:

Hi.

Am 21.11.2019 um 10:49 schrieb Christian Ruppert:

Hi list,

for an old exchange cluster I have some check listener like:
listen chk_s015023
     bind 0.0.0.0:1001
     mode http

     monitor-uri /check

     tcp-request connection reject if { nbsrv lt 6 } { src 
LOCALHOST }

     monitor fail if { nbsrv lt 6 }

     default-server inter 3s rise 2 fall 3

     server s015023_smtp 192.168.15.23:25 check
     server s015023_pop3 192.168.15.23:110 check
     server s015023_imap 192.168.15.23:143 check
     server s015023_https 192.168.15.23:443 check
     server s015023_imaps 192.168.15.23:993 check
     server s015023_pop3s 192.168.15.23:995 check


Which is then being used by the actual backends like:

backend bk_exchange_https
     mode http

     option httpchk HEAD /check HTTP/1.0

     server s015023 192.168.15.23:443 ssl verify none check addr 
127.0.0.1 port 1001 observe layer4
     server s015024 192.168.15.24:443 ssl verify none check addr 
127.0.0.1 port 1002 observe layer4

     ...


The old cluster is currently being updated and there's a included 
health check available for Exchange which I'd like to include.

So I was thinking about something like:
listen chk_s015023_healthcheck
     bind 0.0.0.0:1003
     mode http

     monitor-uri /check_exchange

     tcp-request connection reject if { nbsrv lt 1 } { src 
LOCALHOST }

     monitor fail if { nbsrv lt 1 }

     default-server inter 3s rise 2 fall 3

     option httpchk GET /owa/healthcheck.htm HTTP/1.0

     server s015023_health 192.168.15.23:443 check ssl verify none


listen chk_s015023
     bind 0.0.0.0:1001
     mode http

     monitor-uri /check

     tcp-request connection reject if { nbsrv lt 7 } { src 
LOCALHOST }

     monitor fail if { nbsrv lt 7 }

     default-server inter 3s rise 2 fall 3

     server s015023_smtp 192.168.15.23:25 check
     server s015023_pop3 192.168.15.23:110 check
     server s015023_imap 192.168.15.23:143 check
     server s015023_https 192.168.15.23:443 check
     server s015023_imaps 192.168.15.23:993 check
     server s015023_pop3s 192.168.15.23:995 check
     server chk_s015023_healthcheck 127.0.0.1:1003 check


The new healthcheck is marked as being down/up as expected, the 
problem is, that the TCP check for that new health check "server 
chk_s015023_healthcheck 127.0.0.1:1003 check" doesn't work.
Even though we have that "tcp-request connection reject if { nbsrv lt 
1 } { src LOCALHOST }" within the new check, it doesn't seem to be 
enough for the TCP check.


Is it somehow possible to combine both checks, to make it recognize 
the new check's status properly?
I'd like to avoid using an external check script to do all those 
checks.


Maybe you can use the track feature from haproxy for that topic.
https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#5.2-track

I have never used it but it looks exactly what you want.
1 backend for tcp checks and 1 backend for http right?

Regards
Aleks



Thanks! That seems to do the trick:
listen chk_s015023_healthcheck
bind 0.0.0.0:1003
mode http

monitor-uri /check_exchange

tcp-request connection reject if { nbsrv lt 1 } { src LOCALHOST 
}

monitor fail if { nbsrv lt 1 }

default-server inter 3s rise 2 fall 3

option httpchk GET /owa/healthcheck.htm HTTP/1.0

server s015023_health 192.168.15.23:443 check ssl verify none

listen chk_s015023
bind 0.0.0.0:1001
mode http

monitor-uri /check

tcp-request connection reject if { nbsrv lt 6 } { src LOCALHOST 
}

monitor fail if { nbsrv lt 6 }

default-server inter 3s rise 2 fall 3

server s015023_smtp 192.168.15.23:25 check
server s015023_pop3 192.168.15.23:110 check
server s015023_imap 192.168.15.23:143 check
server s015023_https 192.168.15.23:443 track 
chk_s015023_healthcheck/s015023_health

server s015023_imaps 192.168.15.23:993 check
server s015023_pop3s 192.168.15.23:995 check

--
Regards,
Christian Ruppert



Combining (kind of) http and tcp checks

2019-11-21 Thread Christian Ruppert

Hi list,

for an old exchange cluster I have some check listener like:
listen chk_s015023
bind 0.0.0.0:1001
mode http

monitor-uri /check

tcp-request connection reject if { nbsrv lt 6 } { src LOCALHOST 
}

monitor fail if { nbsrv lt 6 }

default-server inter 3s rise 2 fall 3

server s015023_smtp 192.168.15.23:25 check
server s015023_pop3 192.168.15.23:110 check
server s015023_imap 192.168.15.23:143 check
server s015023_https 192.168.15.23:443 check
server s015023_imaps 192.168.15.23:993 check
server s015023_pop3s 192.168.15.23:995 check


Which is then being used by the actual backends like:

backend bk_exchange_https
mode http

option httpchk HEAD /check HTTP/1.0

server s015023 192.168.15.23:443 ssl verify none check addr 
127.0.0.1 port 1001 observe layer4
server s015024 192.168.15.24:443 ssl verify none check addr 
127.0.0.1 port 1002 observe layer4

...


The old cluster is currently being updated and there's a included health 
check available for Exchange which I'd like to include.

So I was thinking about something like:
listen chk_s015023_healthcheck
bind 0.0.0.0:1003
mode http

monitor-uri /check_exchange

tcp-request connection reject if { nbsrv lt 1 } { src LOCALHOST 
}

monitor fail if { nbsrv lt 1 }

default-server inter 3s rise 2 fall 3

option httpchk GET /owa/healthcheck.htm HTTP/1.0

server s015023_health 192.168.15.23:443 check ssl verify none


listen chk_s015023
bind 0.0.0.0:1001
mode http

monitor-uri /check

tcp-request connection reject if { nbsrv lt 7 } { src LOCALHOST 
}

monitor fail if { nbsrv lt 7 }

default-server inter 3s rise 2 fall 3

server s015023_smtp 192.168.15.23:25 check
server s015023_pop3 192.168.15.23:110 check
server s015023_imap 192.168.15.23:143 check
server s015023_https 192.168.15.23:443 check
server s015023_imaps 192.168.15.23:993 check
server s015023_pop3s 192.168.15.23:995 check
server chk_s015023_healthcheck 127.0.0.1:1003 check


The new healthcheck is marked as being down/up as expected, the problem 
is, that the TCP check for that new health check "server 
chk_s015023_healthcheck 127.0.0.1:1003 check" doesn't work.
Even though we have that "tcp-request connection reject if { nbsrv lt 1 
} { src LOCALHOST }" within the new check, it doesn't seem to be enough 
for the TCP check.


Is it somehow possible to combine both checks, to make it recognize the 
new check's status properly?

I'd like to avoid using an external check script to do all those checks.

--
Regards,
Christian Ruppert



Re: master-worker no-exit-on-failure with SO_REUSEPORT and a port being already in use

2019-11-20 Thread Christian Ruppert

Hi William,

thanks for the patch. I'll test it later today.
What I actually wanted to achieve is:
https://cbonte.github.io/haproxy-dconv/2.0/management.html#4
Then HAProxy tries to bind to all listening ports. If some fatal errors 
happen
(eg: address not present on the system, permission denied), the process 
quits
with an error. If a socket binding fails because a port is already in 
use, then
the process will first send a SIGTTOU signal to all the pids specified 
in the
"-st" or "-sf" pid list. This is what is called the "pause" signal. It 
instructs
all existing haproxy processes to temporarily stop listening to their 
ports so
that the new process can try to bind again. During this time, the old 
process
continues to process existing connections. If the binding still fails 
(because
for example a port is shared with another daemon), then the new process 
sends a
SIGTTIN signal to the old processes to instruct them to resume 
operations just
as if nothing happened. The old processes will then restart listening to 
the
ports and continue to accept connections. Not that this mechanism is 
system


In my test case though it failed to do so.

On 2019-11-19 17:27, William Lallemand wrote:

On Tue, Nov 19, 2019 at 04:19:26PM +0100, William Lallemand wrote:

> I then add another bind for port 80, which is in use by squid already
> and try to reload HAProxy. It takes some time until it failes:
>
> Nov 19 14:39:21 894a0f616fec haproxy[2978]: [WARNING] 322/143921 (2978)
> : Reexecuting Master process
> ...
> Nov 19 14:39:28 894a0f616fec haproxy[2978]: [ALERT] 322/143922 (2978) :
> Starting frontend somefrontend: cannot bind socket [0.0.0.0:80]
> ...
> Nov 19 14:39:28 894a0f616fec systemd[1]: haproxy.service: Main process
> exited, code=exited, status=1/FAILURE
>
> The reload itself is still running (systemd) and will timeout after
> about 90s. After that, because of the Restart=always, I guess, it ends
> up in a restart loop.
>
> So I would have expected that the master process will fallback to the
> old process and proceed with the old child until the problem has been
> fixed.
>


The patch in attachment fixes a bug where haproxy could reexecute 
itself in

waitpid mode with -sf -1.

I'm not sure this is your bug, but if this is the case you should see 
haproxy
in waitpid mode, then the master exiting with the usage message in your 
logs.


--
Regards,
Christian Ruppert



master-worker no-exit-on-failure with SO_REUSEPORT and a port being already in use

2019-11-19 Thread Christian Ruppert

Hi list,

I'm facing some issues with already in use ports and the fallback 
feature, during a reload. SO_REUSEPORT already makes ist easier/better 
but not perfect, as there are still cases were it fails.
In my test case I've got a Squid running on port 80 and a HAProxy with 
"master-worker no-exit-on-failure". I am using the shipped (2.0.8) 
systemd unit file and startup HAProxy with some frontend and a bind on 
like 1337 or something.
I then add another bind for port 80, which is in use by squid already 
and try to reload HAProxy. It takes some time until it failes:


Nov 19 14:39:21 894a0f616fec haproxy[2978]: [WARNING] 322/143921 (2978) 
: Reexecuting Master process

...
Nov 19 14:39:28 894a0f616fec haproxy[2978]: [ALERT] 322/143922 (2978) : 
Starting frontend somefrontend: cannot bind socket [0.0.0.0:80]

...
Nov 19 14:39:28 894a0f616fec systemd[1]: haproxy.service: Main process 
exited, code=exited, status=1/FAILURE


The reload itself is still running (systemd) and will timeout after 
about 90s. After that, because of the Restart=always, I guess, it ends 
up in a restart loop.


So I would have expected that the master process will fallback to the 
old process and proceed with the old child until the problem has been 
fixed.


Can anybody confirm that? Is that intended?

https://cbonte.github.io/haproxy-dconv/2.0/management.html#4
https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#3.1-master-worker

--
Regards,
Christian Ruppert



Re: H/2 via Unix Sockets fails

2019-06-04 Thread Christian Ruppert

Hi Jarno,

On 2019-06-04 12:44, Jarno Huuskonen wrote:

Hi Christian,

On Thu, Apr 25, Christian Ruppert wrote:


listen genlisten_10320-cust1.tls-tcp
acl REQ_TLS_HAS_ECC req.ssl_ec_ext eq 1
tcp-request content accept if { req_ssl_hello_type 1 } # Match
Client SSL Hello

use-server socket-10320-rsa if !REQ_TLS_HAS_ECC
	server socket-10320-rsa unix@/run/haproxy-10320-rsa.sock 
send-proxy-v2


use-server socket-10320-ecc if REQ_TLS_HAS_ECC
	server socket-10320-ecc unix@/run/haproxy-10320-ecc.sock 
send-proxy-v2


Do you need this tcp frontend for just serving both rsa/ecc
certificates ?
If so I think haproxy can do this(with openssl >= 1.0.2) with crt 
keyword:

https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#5.1-crt

-Jarno


listen genlisten_10320-cust1.tls

bind unix@/run/haproxy-10320-rsa.sock accept-proxy user haproxy
group root mode 600 ssl crt /etc/haproxy/test-rsa.pem alpn
h2,http/1.1 process 3
bind unix@/run/haproxy-10320-ecc.sock accept-proxy user haproxy
group root mode 600 ssl crt /etc/haproxy/test-ecc.pem alpn
h2,http/1.1 process 4-8


Yeah, I think we'll still need that construct. What we want to achieve 
with this kind of setup is:
One process/core for pure connections (that TCP stuff), one for HTTP, 
*one* for RSA and all the rest for ECC. RSA costs so much that it's 
really easy to (D)DoS that process which would otherwise affect all 
other processes as well. So we just want to have all that separated, 
http from https and RSA from ECC.


--
Regards,
Christian Ruppert



Re: H/2 via Unix Sockets fails

2019-04-25 Thread Christian Ruppert

Hi Jarno,

thanks, your propsal seems to work. Here's a working test config based 
on one of our production configs:


curl -kvs -o /dev/null https://127.0.0.1:10320 --http1.1

Apr 25 15:32:51 localhost haproxy[2847]: 127.0.0.1:36880 
[25/Apr/2019:15:32:51.554] genfrontend_10310-cust1 
genfrontend_10310-cust1/ -1/-1/-1/-1/0 503 212 - - SC-- 1/1/0/0/0 
0/0 "GET / HTTP/1.1"
Apr 25 15:32:51 localhost haproxy[2846]: 127.0.0.1:36880 
[25/Apr/2019:15:32:51.553] genlisten_10320-cust1.tls~ 
genlisten_10320-cust1.tls/socket-10310 1/0/1 212 -- 1/1/0/0/0 0/0
Apr 25 15:32:51 localhost haproxy[2841]: 127.0.0.1:36880 
[25/Apr/2019:15:32:51.549] genlisten_10320-cust1.tls-tcp 
genlisten_10320-cust1.tls-tcp/socket-10320-ecc 4/0/5 995 -- 1/1/0/0/0 
0/0



curl -kvs -o /dev/null https://127.0.0.1:10320 --http2

Apr 25 15:32:59 localhost haproxy[2847]: 127.0.0.1:36882 
[25/Apr/2019:15:32:59.246] genfrontend_10310-cust1 
genfrontend_10310-cust1/ -1/-1/-1/-1/0 503 212 - - SC-- 1/1/0/0/0 
0/0 "GET / HTTP/1.1"
Apr 25 15:32:59 localhost haproxy[2845]: 127.0.0.1:36882 
[25/Apr/2019:15:32:59.243] genlisten_10320-cust1.tls~ 
genlisten_10320-cust1.tls/socket-10310-h2 3/0/3 184 -- 1/1/0/0/0 0/0
Apr 25 15:32:59 localhost haproxy[2841]: 127.0.0.1:36882 
[25/Apr/2019:15:32:59.228] genlisten_10320-cust1.tls-tcp 
genlisten_10320-cust1.tls-tcp/socket-10320-ecc 16/0/19 990 CD 1/1/0/0/0 
0/0


global
nbproc 8
# ...

listen genlisten_10320-cust1.tls-tcp
mode tcp
bind-process 2
bind :10320

log global
option tcplog

# ...

tcp-request inspect-delay 7s
acl REQ_TLS_HAS_ECC req.ssl_ec_ext eq 1
	tcp-request content accept if { req_ssl_hello_type 1 } # Match Client 
SSL Hello


use-server socket-10320-rsa if !REQ_TLS_HAS_ECC
server socket-10320-rsa unix@/run/haproxy-10320-rsa.sock send-proxy-v2

use-server socket-10320-ecc if REQ_TLS_HAS_ECC
server socket-10320-ecc unix@/run/haproxy-10320-ecc.sock send-proxy-v2

listen genlisten_10320-cust1.tls
mode tcp
log global
option tcplog
bind-process 3-8

	bind unix@/run/haproxy-10320-rsa.sock accept-proxy user haproxy group 
root mode 600 ssl crt /etc/haproxy/test-rsa.pem alpn h2,http/1.1 process 
3
	bind unix@/run/haproxy-10320-ecc.sock accept-proxy user haproxy group 
root mode 600 ssl crt /etc/haproxy/test-ecc.pem alpn h2,http/1.1 process 
4-8


use-server socket-10310-h2 if { ssl_fc_alpn h2 }
server socket-10310-h2 unix@/run/haproxy-10310-h2.sock send-proxy-v2

use-server socket-10310 if !{ ssl_fc_alpn h2 }
server socket-10310 unix@/run/haproxy-10310.sock send-proxy-v2

frontend genfrontend_10310-cust1
bind :10310
	bind unix@/run/haproxy-10310-h2.sock id 210312 accept-proxy user 
haproxy group root mode 600 proto h2 # TLS uplink H2
	bind unix@/run/haproxy-10310.sock id 210310 accept-proxy user haproxy 
group root mode 600 # TLS uplink


mode http
option httplog
log global

# ...



So it would be cool if both were possible, H2 as well as H1 via that 
socket, using "alpn h2,http/1.1"


--
Regards,
Christian Ruppert



Re: H/2 via Unix Sockets fails

2019-04-24 Thread Christian Ruppert

Hi Willy,

that doesn't seem to work either, only HTTP/1.1

We have several hundret listener/frontends/backends and we're using the 
old nbproc > 1 process model.
We have the initial TCP listener that's bound to one core. It checks 
wether it's ECC capable or not and then it goes to the second listener 
that does the actual SSL termination with RSA/ECC on multiple cores and 
from there it goes to the actual frontend, which is on a different core.
We plan to test and migrate to the threading model if it performs as 
good as the current one or even better. But actually that was meant for 
much later that year or even 2020 :(
I'm not sure if that would solve the actual problem, since may still 
need sockets for RSA/ECC I guess.


The inital plan was to just make it also support HTTP2 by adding "alpn 
h2,http/1.1" to the unix bind in the "h2test_tcp.tls"


On 2019-04-24 15:06, Willy Tarreau wrote:

Hi Christian,

On Wed, Apr 24, 2019 at 02:29:40PM +0200, Christian Ruppert wrote:

Hi,

so I did some more tests and it seems to be an issue between 
h2test_tcp.tls
and the frontend, using the UNIX sockets. Adding a TCP bind to that 
listener
also doesn't work. Am I doing it wrong or is it a bug somewhere with 
H/2 and

UNIX sockets?
I also disabled the PROXY protocol - doesn't help.


I currently have no idea about this one. There should be no reason for
H2 to depend on the underlying socket type.

Hmm wait a minute. It might not be related to the UNIX sockets at all.
In fact what's happening is that your first proxy is not advertising
H2 in the ALPN connection, so the second one doesn't receive it and
negociates H1. You could try to add "alpn h2" at the end of your server
line below :


listen h2test_tcp
mode tcp
bind :444
option tcplog
log global
server socket-444-h2test unix@/run/haproxy-444-h2test.sock 
send-proxy-v2

  ^


listen h2test_tcp.tls
mode tcp
option tcplog
log global
bind unix@/run/haproxy-444-h2test.sock accept-proxy user haproxy
group haproxy mode 600 ssl crt /etc/haproxy/ssl/h2test.pem alpn 
h2,http/1.1
server socket-444_2 unix@/run/haproxy-444_2-h2test.sock 
send-proxy-v2

  ^

And on this one as well. However it will break your H1. What are you
trying to do exactly ? Maybe there is a simpler solution.

Willy


--
Regards,
Christian Ruppert



Re: H/2 via Unix Sockets fails

2019-04-24 Thread Christian Ruppert

Hi,

so I did some more tests and it seems to be an issue between 
h2test_tcp.tls and the frontend, using the UNIX sockets. Adding a TCP 
bind to that listener also doesn't work. Am I doing it wrong or is it a 
bug somewhere with H/2 and UNIX sockets?

I also disabled the PROXY protocol - doesn't help.

On 2019-04-23 15:57, Christian Ruppert wrote:

Hey,

we have an older setup using nbproc >1 and having a listener for the
initial tcp connection and one for the actual SSL/TLS, also using tcp
mode which then goes to the actual frontend using http mode. Each
being bound to different processes.
So here's the test config I've used:

listen h2test_tcp
mode tcp
bind :444
option tcplog
log global
server socket-444-h2test unix@/run/haproxy-444-h2test.sock 
send-proxy-v2


listen h2test_tcp.tls
mode tcp
option tcplog
log global
bind unix@/run/haproxy-444-h2test.sock accept-proxy user haproxy
group haproxy mode 600 ssl crt /etc/haproxy/ssl/h2test.pem alpn
h2,http/1.1
server socket-444_2 unix@/run/haproxy-444_2-h2test.sock 
send-proxy-v2


frontend some_frontend
mode http
log global
bind unix@/run/haproxy-444_2-h2test.sock id 444 accept-proxy user
haproxy group haproxy mode 600
bind :80

...


So what I'm doing is:
curl -k4vs https://127.0.0.1:444/~idl0r/ --http1.1
curl -k4vs https://127.0.0.1:444/~idl0r/ --http2

So with HTTP/1.1 I get:
public_http backend_qasl_de/qasl1 0/0/0/0/0 200 510 - -  3/1/0/0/0
0/0 {127.0.0.1:444|curl/7.64.1|} "GET / HTTP/1.1"
h2test_tcp.tls~ h2test_tcp.tls/socket-444_2 5/1/6 605 -- 2/1/0/0/0 0/0
h2test_tcp h2test_tcp/socket-444-h2test 1/0/6 3335 CD 1/1/0/0/0 0/0

With H/2:
public_http public_http/ -1/-1/-1/-1/0 400 187 - - PR--
3/1/0/0/0 0/0 {||} ""
h2test_tcp.tls~ h2test_tcp.tls/socket-444_2 6/0/5 187 SD 2/1/0/0/0 0/0
h2test_tcp h2test_tcp/socket-444-h2test 1/0/5 2911 SD 1/1/0/0/0 0/0

curl says:
# curl -k4vs https://127.0.0.1:444/ --http2
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 444 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: 
ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH

* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-ECDSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=...
*  start date: Mar  1 18:00:17 2019 GMT
*  expire date: May 30 18:00:17 2019 GMT
*  issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after 
upgrade: len=0

* Using Stream ID: 1 (easy handle 0x56087e29b770)

GET / HTTP/2
Host: 127.0.0.1:444
User-Agent: curl/7.64.1
Accept: */*


* http2 error: Remote peer returned unexpected data while we expected
SETTINGS frame.  Perhaps, peer does not support HTTP/2 properly.
* Connection #0 to host 127.0.0.1 left intact
* Closing connection 0

Can anybody else confirm that? Tested with HAProxy 1.9.6.
Any ideas what might be the reason? Right now, I'd guess that's a
Problem with H/2 and those sockets on the HAProxy side.


--
Regards,
Christian Ruppert



H/2 via Unix Sockets fails

2019-04-23 Thread Christian Ruppert

Hey,

we have an older setup using nbproc >1 and having a listener for the 
initial tcp connection and one for the actual SSL/TLS, also using tcp 
mode which then goes to the actual frontend using http mode. Each being 
bound to different processes.

So here's the test config I've used:

listen h2test_tcp
mode tcp
bind :444
option tcplog
log global
server socket-444-h2test unix@/run/haproxy-444-h2test.sock 
send-proxy-v2


listen h2test_tcp.tls
mode tcp
option tcplog
log global
bind unix@/run/haproxy-444-h2test.sock accept-proxy user haproxy 
group haproxy mode 600 ssl crt /etc/haproxy/ssl/h2test.pem alpn 
h2,http/1.1
server socket-444_2 unix@/run/haproxy-444_2-h2test.sock 
send-proxy-v2


frontend some_frontend
mode http
log global
bind unix@/run/haproxy-444_2-h2test.sock id 444 accept-proxy user 
haproxy group haproxy mode 600

bind :80

...


So what I'm doing is:
curl -k4vs https://127.0.0.1:444/~idl0r/ --http1.1
curl -k4vs https://127.0.0.1:444/~idl0r/ --http2

So with HTTP/1.1 I get:
public_http backend_qasl_de/qasl1 0/0/0/0/0 200 510 - -  3/1/0/0/0 
0/0 {127.0.0.1:444|curl/7.64.1|} "GET / HTTP/1.1"

h2test_tcp.tls~ h2test_tcp.tls/socket-444_2 5/1/6 605 -- 2/1/0/0/0 0/0
h2test_tcp h2test_tcp/socket-444-h2test 1/0/6 3335 CD 1/1/0/0/0 0/0

With H/2:
public_http public_http/ -1/-1/-1/-1/0 400 187 - - PR-- 3/1/0/0/0 
0/0 {||} ""

h2test_tcp.tls~ h2test_tcp.tls/socket-444_2 6/0/5 187 SD 2/1/0/0/0 0/0
h2test_tcp h2test_tcp/socket-444-h2test 1/0/5 2911 SD 1/1/0/0/0 0/0

curl says:
# curl -k4vs https://127.0.0.1:444/ --http2
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 444 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: 
ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH

* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-ECDSA-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=...
*  start date: Mar  1 18:00:17 2019 GMT
*  expire date: May 30 18:00:17 2019 GMT
*  issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after 
upgrade: len=0

* Using Stream ID: 1 (easy handle 0x56087e29b770)

GET / HTTP/2
Host: 127.0.0.1:444
User-Agent: curl/7.64.1
Accept: */*

* http2 error: Remote peer returned unexpected data while we expected 
SETTINGS frame.  Perhaps, peer does not support HTTP/2 properly.

* Connection #0 to host 127.0.0.1 left intact
* Closing connection 0

Can anybody else confirm that? Tested with HAProxy 1.9.6.
Any ideas what might be the reason? Right now, I'd guess that's a 
Problem with H/2 and those sockets on the HAProxy side.


--
Regards,
Christian Ruppert



Re: `stats bind-process` broken

2019-04-12 Thread Christian Ruppert

Hey Guys,

I can confirm those issues as well as the proposed fix/workaround to 
solve the issue.
I upgraded our "nbproc" setup from 1.7.x to 1.9.6 today and noticed some 
missing entries from the stats socket, e.g.:

# echo 'show stat' | socat stdio /run/haproxy.stat|wc -l
1442

Which is correct, while after the upgrade it was indeed showing stats 
from a random proc:

# echo 'show stat' | socat stdio /run/haproxy.stat|wc -l
341

etc.

Adding the "process 1" to the "stats socket" line seems to help.

On 2019-04-11 18:24, Willy Tarreau wrote:

Hi Patrick,

On Thu, Apr 11, 2019 at 12:18:14PM -0400, Patrick Hemmer wrote:
With haproxy 1.9.6 the `stats bind-process` directive is not working. 
Every

connection to the socket is going to a random process:

Here's a simple reproduction:
Config:
   global
       nbproc 3
       stats socket /tmp/haproxy.sock level admin
       stats bind-process 1


Testing:
   # for i in {1..5}; do socat - unix:/tmp/haproxy.sock <<< "show 
info" |

grep Pid: ; done
   Pid: 33371
   Pid: 33373
   Pid: 33372
   Pid: 33373
   Pid: 33373


This must be pretty annoying. I don't have memories of anything changed
regarding the bind-process stuff between 1.8 and 1.9 (the threads have
moved a lot however). It could be a side effect of some of these 
changes

though.

However I'm seeing that adding "process 1" on the "stats socket" line
itself fixes the problem. I suspect the issue is located in the 
propagation

of the frontend's mask to the listener, I'll look at this.

Thanks!
Willy



-Patrick


--
Regards,
Christian Ruppert



Re: haproxy and solarflare onload

2017-12-20 Thread Christian Ruppert

Oh, btw, I'm just reading that onload documentation.


Filters
Filters are used to deliver packets received from the wire to the 
appropriate
application. When filters are exhausted it is not possible to create new 
accelerated
sockets. The general recommendation is that applications do not allocate 
more than

4096 filters ‐ or applications should not create more than 4096 outgoing
connections.
The limit does not apply to inbound connections to a listening socket.


On 2017-12-20 13:11, Christian Ruppert wrote:

Hi Elias,

I'm currently preparing a test setup including a SFN8522 + onload.
How did you measure it? When did those errors (drops/discard?) appear,
during a test or some real traffic?
The first thing I did is updating the driver + firmware. Is both
up2date in your case?

I haven't measured / compared the SFN8522 against a X520 nor X710 yet
but do you have RSS / affinity or something related, enabled/set?
Intel has some features and Solarflare may have its own stuff.

On 2017-12-20 11:48, Elias Abacioglu wrote:

Hi,

Yes on the LD_PRELOAD.

Yes, I have one node running with Solarflare SFN8522 2p 10Gbit/s
currently without Onload enabled.
it has 17.5K http_request_rate and ~26% server interrupts on core 0
and 1 where the NIC IRQ is bound to.

And I have a similar node with Intel X710 2p 10Gbit/s.
It has 26.1K http_request_rate and ~26% server interrupts on core 0
and 1 where the NIC IRQ is bound to.

both nodes have 1 socket, Intel Xeon CPU E3-1280 v6, 32 GB RAM.

So without Onload Solarflare performs worse than the X710 since it has
the same amount of SI load with less traffic. And a side note is that
I haven't compared the ethtool settings between Intel and Solarflare,
just running with the defaults of both cards.

I currently have a support ticket open with the Solarflare team to
about the issues I mentioned in my previous mail, if they sort that
out I can perhaps setup a test server if I can manage to free up one
server.
Then we can do some synthetic benchmarks with a set of parameters of
your choosing.

Regards,

/Elias

On Wed, Dec 20, 2017 at 9:48 AM, Willy Tarreau <w...@1wt.eu> wrote:


Hi Elias,

On Tue, Dec 19, 2017 at 02:23:21PM +0100, Elias Abacioglu wrote:

Hi,

I recently bought a solarflare NIC with (ScaleOut) Onload /

OpenOnload to

test it with HAproxy.

Have anyone tried running haproxy with solarflare onload

functions?


After I started haproxy with onload, this started spamming on the

kernel

log:
Dec 12 14:11:54 dflb06 kernel: [357643.035355] [onload]
oof_socket_add_full_hw: 6:3083 ERROR: FILTER TCP 10.3.54.43:4147

[1]

10.3.20.116:80 [2] failed (-16)
Dec 12 14:11:54 dflb06 kernel: [357643.064395] [onload]
oof_socket_add_full_hw: 6:3491 ERROR: FILTER TCP 10.3.54.43:39321

[3]

10.3.20.113:80 [4] failed (-16)
Dec 12 14:11:54 dflb06 kernel: [357643.081069] [onload]
oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403

[5]

10.3.20.30:445 [6] failed (-16)
Dec 12 14:11:54 dflb06 kernel: [357643.082625] [onload]
oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403

[5]

10.3.20.30:445 [6] failed (-16)

And this in haproxy log:
Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy ssl-relay reached

system

memory limit at 9931 sockets. Please check system tunables.
Dec 12 14:12:07 dflb06 haproxy[21146]: Proxy ssl-relay reached

system

memory limit at 9184 sockets. Please check system tunables.
Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system

memory

limit at 9931 sockets. Please check system tunables.
Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system

memory

limit at 9931 sockets. Please check system tunables.


Apparently I've hit the max hardware filter limit on the  card.
Does anyone here have experience in running haproxy with onload

features?

I've never got any report of any such test, though in the past I
thought
it would be nice to run such a test, at least to validate the
perimeter
covered by the library (you're using it as LD_PRELOAD, that's it ?).


Mind sharing insights and advice on how to get a functional setup?


I really don't know what can reasonably be expected from code trying
to
partially bypass a part of the TCP stack to be honnest. From what
I've
read a long time ago, onload might be doing its work in a not very
intrusive way but judging by your messages above I'm having some
doubts
now.

Have you tried without this software, using the card normally ? I
mean,
2 years ago I had the opportunity to test haproxy on a dual-40G
setup
and we reached 60 Gbps of forwarded traffic with all machines in the
test bench reaching their limits (and haproxy reaching 100% as
well),
so for me that proves that the TCP stack still scales extremely well
and that while such acceleration software might make sense for a
next
generation NIC running on old hardware (eg: when 400 Gbps NICs start
to appear), I'm really not convinced that it makes any sense to use
them on well supported setups like 2-4 10Gbps links which ar

Re: haproxy and solarflare onload

2017-12-20 Thread Christian Ruppert

Hi Elias,

I'm currently preparing a test setup including a SFN8522 + onload.
How did you measure it? When did those errors (drops/discard?) appear, 
during a test or some real traffic?
The first thing I did is updating the driver + firmware. Is both up2date 
in your case?


I haven't measured / compared the SFN8522 against a X520 nor X710 yet 
but do you have RSS / affinity or something related, enabled/set? Intel 
has some features and Solarflare may have its own stuff.


On 2017-12-20 11:48, Elias Abacioglu wrote:

Hi,

Yes on the LD_PRELOAD.

Yes, I have one node running with Solarflare SFN8522 2p 10Gbit/s
currently without Onload enabled.
it has 17.5K http_request_rate and ~26% server interrupts on core 0
and 1 where the NIC IRQ is bound to.

And I have a similar node with Intel X710 2p 10Gbit/s.
It has 26.1K http_request_rate and ~26% server interrupts on core 0
and 1 where the NIC IRQ is bound to.

both nodes have 1 socket, Intel Xeon CPU E3-1280 v6, 32 GB RAM.

So without Onload Solarflare performs worse than the X710 since it has
the same amount of SI load with less traffic. And a side note is that
I haven't compared the ethtool settings between Intel and Solarflare,
just running with the defaults of both cards.

I currently have a support ticket open with the Solarflare team to
about the issues I mentioned in my previous mail, if they sort that
out I can perhaps setup a test server if I can manage to free up one
server.
Then we can do some synthetic benchmarks with a set of parameters of
your choosing.

Regards,

/Elias

On Wed, Dec 20, 2017 at 9:48 AM, Willy Tarreau <w...@1wt.eu> wrote:


Hi Elias,

On Tue, Dec 19, 2017 at 02:23:21PM +0100, Elias Abacioglu wrote:

Hi,

I recently bought a solarflare NIC with (ScaleOut) Onload /

OpenOnload to

test it with HAproxy.

Have anyone tried running haproxy with solarflare onload

functions?


After I started haproxy with onload, this started spamming on the

kernel

log:
Dec 12 14:11:54 dflb06 kernel: [357643.035355] [onload]
oof_socket_add_full_hw: 6:3083 ERROR: FILTER TCP 10.3.54.43:4147

[1]

10.3.20.116:80 [2] failed (-16)
Dec 12 14:11:54 dflb06 kernel: [357643.064395] [onload]
oof_socket_add_full_hw: 6:3491 ERROR: FILTER TCP 10.3.54.43:39321

[3]

10.3.20.113:80 [4] failed (-16)
Dec 12 14:11:54 dflb06 kernel: [357643.081069] [onload]
oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403

[5]

10.3.20.30:445 [6] failed (-16)
Dec 12 14:11:54 dflb06 kernel: [357643.082625] [onload]
oof_socket_add_full_hw: 3:2124 ERROR: FILTER TCP 10.3.54.43:62403

[5]

10.3.20.30:445 [6] failed (-16)

And this in haproxy log:
Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy ssl-relay reached

system

memory limit at 9931 sockets. Please check system tunables.
Dec 12 14:12:07 dflb06 haproxy[21146]: Proxy ssl-relay reached

system

memory limit at 9184 sockets. Please check system tunables.
Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system

memory

limit at 9931 sockets. Please check system tunables.
Dec 12 14:12:07 dflb06 haproxy[21145]: Proxy HTTP reached system

memory

limit at 9931 sockets. Please check system tunables.


Apparently I've hit the max hardware filter limit on the  card.
Does anyone here have experience in running haproxy with onload

features?

I've never got any report of any such test, though in the past I
thought
it would be nice to run such a test, at least to validate the
perimeter
covered by the library (you're using it as LD_PRELOAD, that's it ?).


Mind sharing insights and advice on how to get a functional setup?


I really don't know what can reasonably be expected from code trying
to
partially bypass a part of the TCP stack to be honnest. From what
I've
read a long time ago, onload might be doing its work in a not very
intrusive way but judging by your messages above I'm having some
doubts
now.

Have you tried without this software, using the card normally ? I
mean,
2 years ago I had the opportunity to test haproxy on a dual-40G
setup
and we reached 60 Gbps of forwarded traffic with all machines in the
test bench reaching their limits (and haproxy reaching 100% as
well),
so for me that proves that the TCP stack still scales extremely well
and that while such acceleration software might make sense for a
next
generation NIC running on old hardware (eg: when 400 Gbps NICs start
to appear), I'm really not convinced that it makes any sense to use
them on well supported setups like 2-4 10Gbps links which are very
common nowadays. I mean, I managed to run haproxy at 10Gbps 10 years
ago on a core2-duo! Hardware has evolved quite a bit since :-)

Regards,
Willy




Links:
--
[1] http://10.3.54.43:4147
[2] http://10.3.20.116:80
[3] http://10.3.54.43:39321
[4] http://10.3.20.113:80
[5] http://10.3.54.43:62403
[6] http://10.3.20.30:445


--
Regards,
Christian Ruppert



[PATCH] Fix linking / LDFLAGS order of some contrib modules in 1.8

2017-11-30 Thread Christian Ruppert

Hi Willy,

please see the attached patch that fixes linking / LDFLAGS order of some 
contrib modules which is related to e.g. linking with -Wl,--as-needed.


--
Regards,
Christian RuppertFrom c702537864f7e062d18f4ccce3e29d14d4ccf05f Mon Sep 17 00:00:00 2001
From: Christian Ruppert <id...@qasl.de>
Date: Thu, 30 Nov 2017 10:11:36 +0100
Subject: [PATCH] Fix LDFLAGS vs. LIBS re linking order

Signed-off-by: Christian Ruppert <id...@qasl.de>
---
 contrib/mod_defender/Makefile | 5 ++---
 contrib/modsecurity/Makefile  | 5 ++---
 contrib/spoa_example/Makefile | 5 ++---
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/contrib/mod_defender/Makefile b/contrib/mod_defender/Makefile
index ac17774d..efc7d7f6 100644
--- a/contrib/mod_defender/Makefile
+++ b/contrib/mod_defender/Makefile
@@ -28,9 +28,8 @@ EVENT_INC := /usr/include
 endif
 
 CFLAGS  += -g -Wall -pthread
-LDFLAGS += -lpthread  $(EVENT_LIB) -levent_pthreads -lapr-1 -laprutil-1 -lstdc++ -lm
 INCS += -I../../include -I../../ebtree -I$(MOD_DEFENDER_SRC) -I$(APACHE2_INC) -I$(APR_INC) -I$(EVENT_INC)
-LIBS =
+LIBS += -lpthread  $(EVENT_LIB) -levent_pthreads -lapr-1 -laprutil-1 -lstdc++ -lm
 
 CXXFLAGS = -g -std=gnu++11
 CXXINCS += -I$(MOD_DEFENDER_SRC) -I$(MOD_DEFENDER_SRC)/deps -I$(APACHE2_INC) -I$(APR_INC)
@@ -43,7 +42,7 @@ CXXSRCS = $(wildcard $(MOD_DEFENDER_SRC)/*.cpp)
 CXXOBJS = $(patsubst %.cpp, %.o, $(CXXSRCS))
 
 defender: $(OBJS) $(CXXOBJS)
-	$(LD) -o $@ $^ $(LDFLAGS) $(LIBS)
+	$(LD) $(LDFLAGS) -o $@ $^ $(LIBS)
 
 install: defender
 	install defender $(DESTDIR)$(BINDIR)
diff --git a/contrib/modsecurity/Makefile b/contrib/modsecurity/Makefile
index bb918c30..aa0d6e38 100644
--- a/contrib/modsecurity/Makefile
+++ b/contrib/modsecurity/Makefile
@@ -34,14 +34,13 @@ EVENT_INC := /usr/include
 endif
 
 CFLAGS  += -g -Wall -pthread
-LDFLAGS += -lpthread  $(EVENT_LIB) -levent_pthreads -lcurl -lapr-1 -laprutil-1 -lxml2 -lpcre -lyajl
 INCS += -I../../include -I../../ebtree -I$(MODSEC_INC) -I$(APACHE2_INC) -I$(APR_INC) -I$(LIBXML_INC) -I$(EVENT_INC)
-LIBS =
+LIBS += -lpthread  $(EVENT_LIB) -levent_pthreads -lcurl -lapr-1 -laprutil-1 -lxml2 -lpcre -lyajl
 
 OBJS = spoa.o modsec_wrapper.o
 
 modsecurity: $(OBJS)
-	$(LD) $(LDFLAGS) $(LIBS) -o $@ $^ $(MODSEC_LIB)/standalone.a
+	$(LD) $(LDFLAGS) -o $@ $^ $(MODSEC_LIB)/standalone.a $(LIBS)
 
 install: modsecurity
 	install modsecurity $(DESTDIR)$(BINDIR)
diff --git a/contrib/spoa_example/Makefile b/contrib/spoa_example/Makefile
index d04a01e1..c44c2b87 100644
--- a/contrib/spoa_example/Makefile
+++ b/contrib/spoa_example/Makefile
@@ -6,15 +6,14 @@ CC = gcc
 LD = $(CC)
 
 CFLAGS  = -g -O2 -Wall -Werror -pthread
-LDFLAGS = -lpthread -levent -levent_pthreads
 INCS += -I../../ebtree -I./include
-LIBS =
+LIBS = -lpthread -levent -levent_pthreads
 
 OBJS = spoa.o
 
 
 spoa: $(OBJS)
-	$(LD) $(LDFLAGS) $(LIBS) -o $@ $^
+	$(LD) $(LDFLAGS) -o $@ $^ $(LIBS)
 
 install: spoa
 	install spoa $(DESTDIR)$(BINDIR)
-- 
2.13.6



Removing the cpu-map cpu/-set limit?

2017-02-14 Thread Christian Ruppert

Hi,

there is currently a limit for both, the process number itself as well 
as the cpu/-set. Wouldn't it make sense to lift the cpu/-set limit?

Example: I've got a 18 Core / 36 Thread (72 with HT) CPU
Now I want to use non-HT threads (e.g. all even threads) for which I may 
have to go beyond the limit of 64 or just the ones >64 which is 
currently not possible. Is there any reason to limit the cpu/-set value 
instead of just the number of processes?


--
Regards,
Christian Ruppert



Re: SSL/ECC and nbproc >1

2016-11-25 Thread Christian Ruppert

On 2016-11-25 15:26, Willy Tarreau wrote:

On Fri, Nov 25, 2016 at 02:44:35PM +0100, Christian Ruppert wrote:
I have a default bind for process 1 which is basically the http 
frontend and
the actual backend, RSA is bound to another, single process and ECC is 
bound
to all the rest. So in this case SSL (in particular ECC) is the 
problem. The

connections/handshakes should be *actually* using CPU+2 till NCPU.


That's exactly what I'm talking about, look, you have this :

  frontend ECC
 bind-process 3-36
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC
 mode http
 default_backend bk_ram

It creates a single socket (hence a single queue) and shares it between
all processes. Thus each incoming connection will wake up all processes
not doing anything, and the first one capable of grabbing it will take
it as well as a few following ones if any. You end up with a very
unbalanced load making it hard to scale.

Instead you can do this :

  frontend ECC
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 3
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 4
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 5
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 6
 ...
 bind :65420 ssl crt /etc/haproxy/test.pem-ECC process 36
 mode http
 default_backend bk_ram

You'll really have 34 listening sockets all fairly balanced with their
own queue. You can generally achieve higher loads this way and with a
lower average latency.

Also, I tend to bind network IRQs to the same cores as those doing SSL
because you hardly have the two at once. SSL is not able to deal with
traffic capable of saturating a NIC driver, so when SSL saturates the
CPU you have little traffic and when the NIC requires all the CPU for
high traffic, you know there's little SSL.

Cheers,
Willy


Ah! Thanks! I had to remove the default "bind-process 1" or also setting 
the "bind-process 3-36" in the ECC frontend though. I guess it's the 
same at the end. Anyway the IRQ/NIC problem was still the same. I'll 
setup it that way anyway if that's better, together with the Intel 
affinity script or as you said, bound to the related core that does SSL. 
Let's see how well that performs.


--
Regards,
Christian Ruppert



Re: SSL/ECC and nbproc >1

2016-11-25 Thread Christian Ruppert

On 2016-11-25 14:44, Christian Ruppert wrote:

Hi Willy,

On 2016-11-25 14:30, Willy Tarreau wrote:

Hi Christian,

On Fri, Nov 25, 2016 at 12:12:06PM +0100, Christian Ruppert wrote:
I'll compare HT/no-HT afterwards. In my first tests it didn't same to 
make

much of a difference o far.
I also tried (in this case) to disable HT entirely and set it to max. 
36

procs. Basically the same as before.


Also you definitely need to split your bind lines, one per process, to
take advantage of the kernel's ability to load balance between 
multiple
queues. Otherwise the load is always unequal and many processes are 
woken

up for nothing.


I have a default bind for process 1 which is basically the http
frontend and the actual backend, RSA is bound to another, single
process and ECC is bound to all the rest. So in this case SSL (in
particular ECC) is the problem. The connections/handshakes should be
*actually* using CPU+2 till NCPU. The only shared part should be the
backend but that should be actually no problem for e.g. 5 parallel
benchmarks as a single HTTP benchmark can make >20k requests/s.

global
nbproc 36

defaults:
bind-process 1

frontend http
bind :65410
mode http
default_backend bk_ram

frontend ECC
bind-process 3-36
bind :65420 ssl crt /etc/haproxy/test.pem-ECC
mode http
default_backend bk_ram

backend bk_ram
mode http
fullconn 75000
errorfile 503 /etc/haproxy/test.error




Regards,
Willy


It seems to be the NIC or rather driver/kernel. Using Intel's 
set_irq_affinity (set_irq_affinity -x local eth2 eth3) seems to do the 
trick, at least at the first glance.


--
Regards,
Christian Ruppert



Re: SSL/ECC and nbproc >1

2016-11-25 Thread Christian Ruppert

Hi Willy,

On 2016-11-25 14:30, Willy Tarreau wrote:

Hi Christian,

On Fri, Nov 25, 2016 at 12:12:06PM +0100, Christian Ruppert wrote:
I'll compare HT/no-HT afterwards. In my first tests it didn't same to 
make

much of a difference o far.
I also tried (in this case) to disable HT entirely and set it to max. 
36

procs. Basically the same as before.


Also you definitely need to split your bind lines, one per process, to
take advantage of the kernel's ability to load balance between multiple
queues. Otherwise the load is always unequal and many processes are 
woken

up for nothing.


I have a default bind for process 1 which is basically the http frontend 
and the actual backend, RSA is bound to another, single process and ECC 
is bound to all the rest. So in this case SSL (in particular ECC) is the 
problem. The connections/handshakes should be *actually* using CPU+2 
till NCPU. The only shared part should be the backend but that should be 
actually no problem for e.g. 5 parallel benchmarks as a single HTTP 
benchmark can make >20k requests/s.


global
nbproc 36

defaults:
bind-process 1

frontend http
bind :65410
mode http
default_backend bk_ram

frontend ECC
bind-process 3-36
bind :65420 ssl crt /etc/haproxy/test.pem-ECC
mode http
default_backend bk_ram

backend bk_ram
mode http
fullconn 75000
errorfile 503 /etc/haproxy/test.error




Regards,
Willy


--
Regards,
Christian Ruppert



Re: SSL/ECC and nbproc >1

2016-11-25 Thread Christian Ruppert

Hi Conrad,

On 2016-10-21 17:39, Conrad Hoffmann wrote:

Hi,

it's a lot of information, and I don't have time to go into all details
right now, but from a quick read, here are the things I noticed:

- Why nbproc 64? Your CPU has 18 cores (36 w/ HT), so more procs than 
that

will likely make performance rather worse. HT cores share the cache, so
using 18 might make most sense (see also below). It's best to 
experiment a

little with that and measure the results, though.


I'll compare HT/no-HT afterwards. In my first tests it didn't same to 
make much of a difference o far.
I also tried (in this case) to disable HT entirely and set it to max. 36 
procs. Basically the same as before.




- If you see ksoftirq eating up a lot of of one CPU, then your box is 
most
likely configured to process all IRQs on the first core. Most NICs 
these
days can be configured to use several IRQs, which you can then 
distribute

across all cores, smoothening the workload across cores significantly.


I'll try to get a more recent Distro (It's a Debian Wheezy still) with a 
newer driver etc. They seem to have added some IRQ options in more 
recent versions of ixgbe. Kernel could also be related.

So disabling HT did not help.
nginx seems to have similar problem btw. so it's neither HAProxy nor 
nginx I guess.




- Consider using "bind-process" to lock the processes to a single core 
(but

make sure to leave out the HT cores, or disable HT altogether). Less
context switching, might improve performance)

Hope that helps,
Conrad



On 10/21/2016 04:47 PM, Christian Ruppert wrote:

Hi,

again a performance topic.
I did some further testing/benchmarks with ECC and nbproc >1. I was 
testing

on a "E5-2697 v4" and the first thing I noticed was that HAProxy has a
fixed limit of 64 for nbproc. So the setup:

HAProxy server with the mentioned E5:
global
user haproxy
group haproxy
maxconn 75000
log 127.0.0.2 local0
ssl-default-bind-ciphers
ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDH

ssl-default-bind-options no-sslv3 no-tls-tickets
tune.ssl.default-dh-param 1024

nbproc 64

defaults
timeout client 300s
timeout server 300s
timeout queue 60s
timeout connect 7s
timeout http-request 10s
maxconn 75000

bind-process 1

# HTTP
frontend haproxy_test_http
bind :65410
mode http
option httplog
option httpclose
log global
default_backend bk_ram

# ECC
frontend haproxy_test-ECC
bind-process 3-64
bind :65420 ssl crt /etc/haproxy/test.pem-ECC
mode http
option httplog
option httpclose
log global
default_backend bk_ram

backend bk_ram
mode http
fullconn 75000 # Just in case the lower default limit will be 
reached...

errorfile 503 /etc/haproxy/test.error



/etc/haproxy/test.error:
HTTP/1.0 200
Cache-Control: no-cache
Connection: close
Content-Type: text/plain

Test123456


The ECC key:
openssl ecparam -genkey -name prime256v1 -out 
/etc/haproxy/test.pem-ECC.key

openssl req -new -sha256 -key /etc/haproxy/test.pem-ECC.key -days 365
-nodes -x509 -sha256 -subj "/O=ECC Test/CN=test.example.com" -out
/etc/haproxy/test.pem-ECC.crt
cat /etc/haproxy/test.pem-ECC.key /etc/haproxy/test.pem-ECC.crt >
/etc/haproxy/test.pem-ECC


So then I tried a local "ab":
ab -n 5000 -c 250 https://127.0.0.1:65420/
Server Hostname:127.0.0.1
Server Port:65420
SSL/TLS Protocol:   
TLSv1/SSLv3,ECDHE-ECDSA-AES128-GCM-SHA256,256,128


Document Path:  /
Document Length:107 bytes

Concurrency Level:  250
Time taken for tests:   3.940 seconds
Complete requests:  5000
Failed requests:0
Write errors:   0
Non-2xx responses:  5000
Total transferred:  106 bytes
HTML transferred:   535000 bytes
Requests per second:1268.95 [#/sec] (mean)
Time per request:   197.013 [ms] (mean)
Time per request:   0.788 [ms] (mean, across all concurrent 
requests)

Transfer rate:  262.71 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:   54  138  34.7162 193
Processing: 8   51  34.8 24 157
Waiting:3   40  31.6 18 113
Total:177  189   7.5188 333

Percentage of the requests served within a certain time (ms)
  50%188
  66%189
  75%190
  80%190
  90%191
  95%192
  98%196
  99%205
 100%333 (longest request)

The same test with just nbproc 1 was about ~1500 requests/s. So 1,5k *
nbproc would have been what I expected, at least somewhere near that 
value.


Then I setup 61 EC2 instances, standard setup t2-micro. They're 
somewhat

slower with ~1k ECC requests per second but that's ok for the test.
HTTP (one proc) via localhost was around 27-28k r/s, remote (EC2) 
~450

SSL/ECC and nbproc >1

2016-10-21 Thread Christian Ruppert
ported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with OpenSSL version : OpenSSL 1.0.1e 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1t  3 May 2016 (VERSIONS 
DIFFER!)

OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.30 2012-02-04
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND


Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

I actually thought I was using 1.6.9 on that host already so I just 
upgraded and tried again some benchmarks but it looks like it's almost 
equal at the first glance.


--
Regards,
Christian Ruppert



Stale UNIX sockets after reload

2016-05-09 Thread Christian Ruppert

Hi,

it seems that HAProxy does not remove the UNIX sockets after reloading 
(also restarting?) even though they have been removed from the 
configuration and thus are stale afterwards.
At least 1.6.4 seems to be affected. Can anybody else confirm that? It's 
a multi-process setup in this case but it also happens with binds bound 
to just one process.


--
Regards,
Christian Ruppert



Re: Sharing SSL information via PROXY protocol or HAProxy internally

2016-04-16 Thread Christian Ruppert

Hi Dennis,

On 2016-04-16 02:13, Dennis Jacobfeuerborn wrote:

On 15.04.2016 16:01, Christian Ruppert wrote:

Hi,

would it be possible to inherit the SSL information from a SSL
listener/frontend via PROXY protocol?
So for example:

listen ssl-relay
mode tcp

...

server rsa unix@/var/run/haproxy_ssl_rsa.sock send-proxy-v2

listen ssl-rsa_ecc
mode tcp

...

bind unix@/var/run/haproxy_ssl_rsa.sock accept-proxy ssl crt
SSl-RSA.PEM user haproxy

frontend http_https
bind :80 # http
bind unix@/var/run/haproxy_ssl.sock accept-proxy user haproxy # 
https


redirect scheme https code 301 if !{ssl_fc}


Here the ssl_fc and other SSL related ACLs do not work because the
actual SSL termination has been done in the above ssl-rsa_ecc 
listener.
Sharing that either internally or via the PROXY protocol would be 
really

handy, if that's possible.
For now we use the bind "id" to check whether it's the proxy 
connection

or not but the above would be much easier/better IMHO.


For this specific case of http to https redirect I use the
X-Forwarded-Proto header. In the ssl frontend I do this:

http-request set-header X-Forwarded-Proto https

and in the plain http frontend I do this:

http-request redirect scheme https if !{ req.hdr(X-Forwarded-Proto) 
https }


The problem here is that one could set that in a plain http request as 
well and would avoid some redirects and whatnot, depending on what you 
do based on what decision. You may also want the other SSL data, cipher, 
version etc. Since 1.6 you can set variables, ok, but somehow passing 
that kind of information could be really useful I guess.




You usually need to set this header anyway so the application knows it
needs to generate https URLs in the generated HTML.

Regards,
  Dennis


--
Regards,
Christian Ruppert



Sharing SSL information via PROXY protocol or HAProxy internally

2016-04-15 Thread Christian Ruppert

Hi,

would it be possible to inherit the SSL information from a SSL 
listener/frontend via PROXY protocol?

So for example:

listen ssl-relay
mode tcp

...

server rsa unix@/var/run/haproxy_ssl_rsa.sock send-proxy-v2

listen ssl-rsa_ecc
mode tcp

...

bind unix@/var/run/haproxy_ssl_rsa.sock accept-proxy ssl crt 
SSl-RSA.PEM user haproxy


frontend http_https
bind :80 # http
bind unix@/var/run/haproxy_ssl.sock accept-proxy user haproxy # 
https


redirect scheme https code 301 if !{ssl_fc}


Here the ssl_fc and other SSL related ACLs do not work because the 
actual SSL termination has been done in the above ssl-rsa_ecc listener. 
Sharing that either internally or via the PROXY protocol would be really 
handy, if that's possible.
For now we use the bind "id" to check whether it's the proxy connection 
or not but the above would be much easier/better IMHO.


--
Regards,
Christian Ruppert



Re: nbproc 1 vs >1 performance

2016-04-15 Thread Christian Ruppert

On 2016-04-14 11:06, Christian Ruppert wrote:

Hi Willy,

On 2016-04-14 10:17, Willy Tarreau wrote:

On Thu, Apr 14, 2016 at 08:55:47AM +0200, Lukas Tribus wrote:

Le me put it this way:

frontend haproxy_test
 bind-process 1-8
 bind :12345 process 1
 bind :12345 process 2
 bind :12345 process 3
 bind :12345 process 4


Leads to 8 processes, and the master process binds the socket 4 times 
(PID

16509):


(...)

lukas@ubuntuvm:~/haproxy-1.5$ sudo netstat -tlp | grep hap
tcp0  0 *:12345 *:* LISTEN  
16509/haproxy
tcp0  0 *:12345 *:* LISTEN  
16509/haproxy
tcp0  0 *:12345 *:* LISTEN  
16509/haproxy
tcp0  0 *:12345 *:* LISTEN  
16509/haproxy

lukas@ubuntuvm:~/haproxy-1.5$


OK so it's netstat which gives a wrong report, I have the same here. I 
verified
in /proc/$PID/fd/ and I properly saw the FDs. Next, "ss -anp" also 
shows all the

process list :

  LISTEN 0  128   *:12345
  *:*
users:(("haproxy",25360,7),("haproxy",25359,7),("haproxy",25358,7),("haproxy",25357,7),("haproxy",25356,7),("haproxy",25355,7),("haproxy",25354,7),("haproxy",25353,7))
  LISTEN 0  128   *:12345
  *:*
users:(("haproxy",25360,6),("haproxy",25359,6),("haproxy",25358,6),("haproxy",25357,6),("haproxy",25356,6),("haproxy",25355,6),("haproxy",25354,6),("haproxy",25353,6))
  LISTEN 0  128   *:12345
  *:*
users:(("haproxy",25360,5),("haproxy",25359,5),("haproxy",25358,5),("haproxy",25357,5),("haproxy",25356,5),("haproxy",25355,5),("haproxy",25354,5),("haproxy",25353,5))
  LISTEN 0  128   *:12345
  *:*
users:(("haproxy",25360,4),("haproxy",25359,4),("haproxy",25358,4),("haproxy",25357,4),("haproxy",25356,4),("haproxy",25355,4),("haproxy",25354,4),("haproxy",25353,4))

A performance test also shows a fair distribution of the load :

  25353 willy 20   0 21872 4216 1668 S   26  0.1   0:04.54 haproxy
  25374 willy 20   0  7456  1080 S   25  0.0   0:02.26 
injectl464
  25376 willy 20   0  7456  1080 S   25  0.0   0:02.27 
injectl464
  25377 willy 20   0  7456  1080 S   25  0.0   0:02.26 
injectl464
  25375 willy 20   0  7456  1080 S   24  0.0   0:02.26 
injectl464

  25354 willy 20   0 21872 4168 1620 R   22  0.1   0:04.51 haproxy
  25356 willy 20   0 21872 4216 1668 R   22  0.1   0:04.21 haproxy
  25355 willy 20   0 21872 4168 1620 S   21  0.1   0:04.38 haproxy

However, as you can see these sockets are still bound to all processes 
and

that's not a good idea in the multi-queue mode.

I have added a few debug lines in enable_listener() like this :

$ git diff
diff --git a/src/listener.c b/src/listener.c
index 5abeb80..59c51a1 100644
--- a/src/listener.c
+++ b/src/listener.c
@@ -49,6 +49,7 @@ static struct bind_kw_list bind_keywords = {
  */
 void enable_listener(struct listener *listener)
 {
+   fddebug("%d: enabling fd %d\n", getpid(), listener->fd);
if (listener->state == LI_LISTEN) {
if ((global.mode & (MODE_DAEMON | MODE_SYSTEMD)) &&
listener->bind_conf->bind_proc &&
@@ -57,6 +58,7 @@ void enable_listener(struct listener *listener)
 * want any fd event to reach it.
 */
fd_stop_recv(listener->fd);
+   fddebug("%d: pausing fd %d\n", getpid(), 
listener->fd);

listener->state = LI_PAUSED;
}
else if (listener->nbconn < listener->maxconn) {

And we're seeing this upon startup for processes 25746..25755 :

Thus as you can see that FDs are properly enabled and paused for the
unavailable ones.

willy@wtap:haproxy$ grep 4294967295 log | grep 25746
25746 write(4294967295, "25746: enabling fd 4\n", 21 
25746 write(4294967295, "25746: enabling fd 5\n", 21 
25746 write(4294967295, "25746: pausing fd 5\n", 20) = -1 EBADF (Bad
file descriptor)
25746 write(4294967295, "25746: enabling fd 6\n", 21) = -1 EBADF (Bad
file descriptor)
25746 write(4294967295, "25746: pausing fd 6\n", 20) = -1 EBADF (Bad
file descriptor)
25746 write(4294967295, "25746: enabling fd 7\n", 21 
25746 write(4294967295, "25746: pausing fd 7\n", 20 
willy@wtap:haproxy$ grep 4294967295 log | grep 25747
25747 write(4294967295, "25747: enabling fd 4\n", 21 
25747 write(4294967295, "25747: pausing fd 4\n", 20 
25747 write(4294967295

Re: nbproc 1 vs >1 performance

2016-04-14 Thread Christian Ruppert
47 write(4294967295, "25747: enabling fd 6\n", 21 
25747 write(4294967295, "25747: pausing fd 6\n", 20 
25747 write(4294967295, "25747: enabling fd 7\n", 21 
25747 write(4294967295, "25747: pausing fd 7\n", 20 
willy@wtap:haproxy$ grep 4294967295 log | grep 25748
25748 write(4294967295, "25748: enabling fd 4\n", 21 
25748 write(4294967295, "25748: pausing fd 4\n", 20 
25748 write(4294967295, "25748: enabling fd 5\n", 21 
25748 write(4294967295, "25748: pausing fd 5\n", 20 
25748 write(4294967295, "25748: enabling fd 6\n", 21 
25748 write(4294967295, "25748: enabling fd 7\n", 21 
25748 write(4294967295, "25748: pausing fd 7\n", 20 
willy@wtap:haproxy$ grep 4294967295 log | grep 25749
25749 write(4294967295, "25749: enabling fd 4\n", 21 
25749 write(4294967295, "25749: pausing fd 4\n", 20 
25749 write(4294967295, "25749: enabling fd 5\n", 21 
25749 write(4294967295, "25749: pausing fd 5\n", 20 
25749 write(4294967295, "25749: enabling fd 6\n", 21 
25749 write(4294967295, "25749: pausing fd 6\n", 20 
25749 write(4294967295, "25749: enabling fd 7\n", 21 
willy@wtap:haproxy$ grep 4294967295 log | grep 25750
25750 write(4294967295, "25750: enabling fd 4\n", 21 
25750 write(4294967295, "25750: pausing fd 4\n", 20 
25750 write(4294967295, "25750: enabling fd 5\n", 21 
25750 write(4294967295, "25750: pausing fd 5\n", 20 
25750 write(4294967295, "25750: enabling fd 6\n", 21 
25750 write(4294967295, "25750: pausing fd 6\n", 20 
25750 write(4294967295, "25750: enabling fd 7\n", 21 
25750 write(4294967295, "25750: pausing fd 7\n", 20 


Now with the following patch to completely unbind such listeners :

diff --git a/src/listener.c b/src/listener.c
index 5abeb80..0296d50 100644
--- a/src/listener.c
+++ b/src/listener.c
@@ -56,8 +57,7 @@ void enable_listener(struct listener *listener)
/* we don't want to enable this listener and 
don't

 * want any fd event to reach it.
 */
-   fd_stop_recv(listener->fd);
-   listener->state = LI_PAUSED;
+   unbind_listener(listener);
}
else if (listener->nbconn < listener->maxconn) {
fd_want_recv(listener->fd);


I get this which is much cleaner :

LISTEN 0  128   *:12345
*:*  users:(("haproxy",25949,7))
LISTEN 0  128   *:12345
*:*  users:(("haproxy",25948,6))
LISTEN 0  128   *:12345
*:*  users:(("haproxy",25947,5))
LISTEN 0  128   *:12345
*:*  users:(("haproxy",25946,4))

So I guess that indeed, if not all the processes a frontend is bound to
have a corresponding bind line, this can cause connection issues as 
some
incoming connections will be distributed to queues that nobody listens 
to.


I'm willing to commit this patch to make things cleaner and more 
reliable.
Here I'm getting the exact same performance with and without. Christian 
you
may want to apply it by hand to test if it improves the behaviour for 
you.


First of all thanks for the quick response!

I've applied your patch and I just looked at the performance so far. The 
performance is still the same, so the lessperformant one is still less 
performant than the moreperformant.cfg. So from the performance point of 
view there's no difference between with and without that patch.


We have a (IMHO) quite huge haproxy config with around 200 frontends, 
~180 backends and ~90 listener. So the intention was to combine a http 
bind with a ssl bind in one frontend so that we keep it at a minimum 
that is necessary to get it working properly:

listen ssl-relay
mode tcp

bind-process 2

bind :443 process 2

tcp-request inspect-delay 7s
acl HAS_ECC req.ssl_ec_ext eq 1
tcp-request content accept if { req_ssl_hello_type 1 } # Client 
Hello


use-server ecc if HAS_ECC
server ecc unix@/var/run/haproxy_ssl_ecc.sock send-proxy-v2

use-server rsa if !HAS_ECC
server rsa unix@/var/run/haproxy_ssl_rsa.sock send-proxy-v2


frontend http_https_combined
mode http

bind-process 1-40

    bind :80 process 1
bind unix@/var/run/haproxy_ssl_ecc.sock accept-proxy ssl crt 
/etc/haproxy/ssltest.pem-ECC user haproxy process 4-40
bind unix@/var/run/haproxy_ssl_rsa.sock accept-proxy ssl crt 
/etc/haproxy/ssltest.pem-RSA user haproxy process 3


...

default_backend somebackend


The plan was to have the same http performance as before, move SSL 
termination onto other/independent processes and furthermore split RSA 
and ECC. The "bind-process 1-40" is necessary because it's otherwise not 
possible to bind the SSL binds to the other processes.




Please also note that you'll get a build warning that first needs 
another
fix on listen_accept() which doesn't have the same prototype between 
the

.c and the .h (!). I'll handle it as well.

Cheers,
Willy


--
Regards,
Christian Ruppert



nbproc 1 vs >1 performance

2016-04-13 Thread Christian Ruppert

Hi,

I've prepared a simple testcase:

haproxy-moreperformant.cfg:
global
nbproc 40
user haproxy
group haproxy
maxconn 175000

defaults
timeout client 300s
timeout server 300s
timeout queue 60s
timeout connect 7s
timeout http-request 10s
maxconn 175000

bind-process 1

frontend haproxy_test
#bind-process 1-40
bind :12345 process 1

mode http

default_backend backend_test

backend backend_test
mode http

errorfile 503 /etc/haproxy/test.error

# vim: set syntax=haproxy:


haproxy-lessperformant.cfg:
global
nbproc 40
user haproxy
group haproxy
maxconn 175000

defaults
timeout client 300s
timeout server 300s
timeout queue 60s
timeout connect 7s
timeout http-request 10s
maxconn 175000

bind-process 1

frontend haproxy_test
bind-process 1-40
bind :12345 process 1

mode http

default_backend backend_test

backend backend_test
mode http

errorfile 503 /etc/haproxy/test.error

# vim: set syntax=haproxy:

/etc/haproxy/test.error:
HTTP/1.0 200
Cache-Control: no-cache
Content-Type: text/plain

Test123456


The test:
ab -n 5000 -c 250 http://xx.xx.xx.xx:12345

With the first config I get around ~30-33k requests/s on my test system, 
with the second conf (only the bind-process in the frontend section has 
been changed!) I just get around 26-28k requests per second.


I could get similar differences when playing with nbproc 1 and >1 as 
well as the default "bind-process" and/or the "process 1" on the actual 
bind.
Is it really just the multi process overhead causing the performance 
drop here, even tough the bind uses the first / only one process anyway?


--
Regards,
Christian Ruppert



Re: Weird stick-tables / peers behaviour

2016-03-29 Thread Christian Ruppert

On 2016-03-29 15:13, Christian Ruppert wrote:

On 2016-03-29 10:58, Christian Ruppert wrote:

Hi Willy,

On 2016-03-25 18:17, Willy Tarreau wrote:

On Fri, Mar 25, 2016 at 01:53:50PM +0100, Willy Tarreau wrote:
I think it's even different (but could be wrong) since Christian 
spoke
about counters suddenly doubling. The issue you faced Sylvain which 
I

still have no idea how to fix unfortunately is that the peers applet
is not always woken up when a connection establishes on the other 
side

and it may simply miss an event, resulting in everything remaining
stable and appear frozen until the connection closes. Here it seems
data are exchanged but incorrect. This one could be easier to 
reproduce

however, we'll check.


OK I found it. Indeed it was easy to reproduce. The frequency 
counters
are sent as "now - freq.date", which is a positive age compared to 
the
current date. But on receipt, this age was *added* to the current 
date
instead of subtracted. So since the date was always in the future, 
they

were always expired if the activity changed side in less than the
counter's measuring period (eg: 10s).

I'm commiting this simple fix that you can apply to your tree for 
now.


Cheers,
Willy

diff --git a/src/peers.c b/src/peers.c
index c29ea73..9918dac 100644
--- a/src/peers.c
+++ b/src/peers.c
@@ -1153,7 +1153,7 @@ switchstate:
case 
STD_T_FRQP: {

struct freq_ctr_period data;

-		data.curr_tick = tick_add(now_ms, intdecode(_cur, 
msg_end));
+		data.curr_tick = tick_add(now_ms, -intdecode(_cur, 
msg_end));


if (!msg_cur) {

/* malformed message */
   
 appctx->st0 = PEER_SESS_ST_ERRPROTO;


Thanks a lot for the fast investigation! The proposed patch seems to
do the trick :)


Hrm, or not. At least not completely.
There's still something wrong it seems:
20160329 15:07:03: 0x3bca858: key=xx.xx.xx.xx use=0 exp=28799601
gpc0=0 conn_cnt=682 conn_rate(1)=1 conn_cur=3 sess_cnt=1
sess_rate(1)=-1032058827 http_req_cnt=0 http_req_rate(1)=2272
http_err_cnt=3 http_err_rate(1)=1143800 bytes_in_cnt=0
bytes_out_cnt=247977
Note the sess_rate is a negative int. Some http_err_rate seems to be
affected as well. Even the http_req_rate seems to be still wrong, in
some cases.
20160329 15:11:38: 0x3e67318: key=xx.xx.xx.xx use=0 exp=28605259
gpc0=0 conn_cnt=86 conn_rate(1)=0 conn_cur=7 sess_cnt=0
sess_rate(1)=0 http_req_cnt=0 http_req_rate(1)=349038424
http_err_cnt=6 http_err_rate(1)=0 bytes_in_cnt=0
bytes_out_cnt=3261818950
We're using httpclose so in this case it *actually* should match the
conn_cnt so 86.


I haven't had enough time yet but it looks like I had one case where the 
now_ms? was used as value and if that would explain the integer overflow 
within http_sess_rate if that is added furthermore.


--
Regards,
Christian Ruppert



Re: Weird stick-tables / peers behaviour

2016-03-29 Thread Christian Ruppert

On 2016-03-29 10:58, Christian Ruppert wrote:

Hi Willy,

On 2016-03-25 18:17, Willy Tarreau wrote:

On Fri, Mar 25, 2016 at 01:53:50PM +0100, Willy Tarreau wrote:
I think it's even different (but could be wrong) since Christian 
spoke

about counters suddenly doubling. The issue you faced Sylvain which I
still have no idea how to fix unfortunately is that the peers applet
is not always woken up when a connection establishes on the other 
side

and it may simply miss an event, resulting in everything remaining
stable and appear frozen until the connection closes. Here it seems
data are exchanged but incorrect. This one could be easier to 
reproduce

however, we'll check.


OK I found it. Indeed it was easy to reproduce. The frequency counters
are sent as "now - freq.date", which is a positive age compared to the
current date. But on receipt, this age was *added* to the current date
instead of subtracted. So since the date was always in the future, 
they

were always expired if the activity changed side in less than the
counter's measuring period (eg: 10s).

I'm commiting this simple fix that you can apply to your tree for now.

Cheers,
Willy

diff --git a/src/peers.c b/src/peers.c
index c29ea73..9918dac 100644
--- a/src/peers.c
+++ b/src/peers.c
@@ -1153,7 +1153,7 @@ switchstate:
case 
STD_T_FRQP: {

struct freq_ctr_period data;

-		data.curr_tick = tick_add(now_ms, intdecode(_cur, 
msg_end));
+		data.curr_tick = tick_add(now_ms, -intdecode(_cur, 
msg_end));


if (!msg_cur) {

/* malformed message */
   
 appctx->st0 = PEER_SESS_ST_ERRPROTO;


Thanks a lot for the fast investigation! The proposed patch seems to
do the trick :)


Hrm, or not. At least not completely.
There's still something wrong it seems:
20160329 15:07:03: 0x3bca858: key=xx.xx.xx.xx use=0 exp=28799601 gpc0=0 
conn_cnt=682 conn_rate(1)=1 conn_cur=3 sess_cnt=1 
sess_rate(1)=-1032058827 http_req_cnt=0 http_req_rate(1)=2272 
http_err_cnt=3 http_err_rate(1)=1143800 bytes_in_cnt=0 
bytes_out_cnt=247977
Note the sess_rate is a negative int. Some http_err_rate seems to be 
affected as well. Even the http_req_rate seems to be still wrong, in 
some cases.
20160329 15:11:38: 0x3e67318: key=xx.xx.xx.xx use=0 exp=28605259 gpc0=0 
conn_cnt=86 conn_rate(1)=0 conn_cur=7 sess_cnt=0 sess_rate(1)=0 
http_req_cnt=0 http_req_rate(1)=349038424 http_err_cnt=6 
http_err_rate(1)=0 bytes_in_cnt=0 bytes_out_cnt=3261818950
We're using httpclose so in this case it *actually* should match the 
conn_cnt so 86.


--
Regards,
Christian Ruppert



Re: Weird stick-tables / peers behaviour

2016-03-29 Thread Christian Ruppert

Hi Willy,

On 2016-03-25 18:17, Willy Tarreau wrote:

On Fri, Mar 25, 2016 at 01:53:50PM +0100, Willy Tarreau wrote:

I think it's even different (but could be wrong) since Christian spoke
about counters suddenly doubling. The issue you faced Sylvain which I
still have no idea how to fix unfortunately is that the peers applet
is not always woken up when a connection establishes on the other side
and it may simply miss an event, resulting in everything remaining
stable and appear frozen until the connection closes. Here it seems
data are exchanged but incorrect. This one could be easier to 
reproduce

however, we'll check.


OK I found it. Indeed it was easy to reproduce. The frequency counters
are sent as "now - freq.date", which is a positive age compared to the
current date. But on receipt, this age was *added* to the current date
instead of subtracted. So since the date was always in the future, they
were always expired if the activity changed side in less than the
counter's measuring period (eg: 10s).

I'm commiting this simple fix that you can apply to your tree for now.

Cheers,
Willy

diff --git a/src/peers.c b/src/peers.c
index c29ea73..9918dac 100644
--- a/src/peers.c
+++ b/src/peers.c
@@ -1153,7 +1153,7 @@ switchstate:
case 
STD_T_FRQP: {

struct freq_ctr_period data;

-		data.curr_tick = tick_add(now_ms, intdecode(_cur, 
msg_end));
+		data.curr_tick = tick_add(now_ms, -intdecode(_cur, 
msg_end));


if (!msg_cur) {

/* malformed message */
   
 appctx->st0 = PEER_SESS_ST_ERRPROTO;


Thanks a lot for the fast investigation! The proposed patch seems to do 
the trick :)


--
Regards,
Christian Ruppert



Re: src_get_gpc0 seems not to work after commit f71f6f6

2016-03-24 Thread Christian Ruppert

Hi Seri,

On 2016-03-23 08:40, Sehoon Kim wrote:

Hi,

As below, I use stick-table for temporary acl.
After commit f71f6f6, src_get_gpc0 seems not to work.

So, I revert commit f71f6f6, and it works!!


That's not a valid commit in the official haproxy repo, can you please 
check the hash again?




frontend SSL-Offload
bind :443 ssl crt ssl.pem ecdhe prime256v1

tcp-request connection accept if { src_get_gpc0(whitelist) eq 1 }
tcp-request connection reject

backend whitelist
stick-table type ip size 1m expire 1h nopurge store gpc0

Thanks

Seri


--
Regards,
Christian Ruppert



Weird stick-tables / peers behaviour

2016-03-24 Thread Christian Ruppert

Hi all,

I've just upgraded some hosts to 1.6.4 (from 1.5) and immediately got a 
bunch of SMS because we're using stick-tables to track the connections 
and monitor http_req_rate. The stick-tables data will be synced to the 
other peers using the "peers" section.

So I setup a test case using two HAProxy instances with e.g.:
global
user haproxy
group haproxy
maxconn 1
stats socket /var/run/haproxy.stat user haproxy gid haproxy mode 
600 level admin


# aus der anti-dos config
defaults
timeout client 60s
timeout server 60s
timeout queue 60s
timeout connect 3s
timeout http-request 10s


frontend test
bind 0.0.0.0:8080
mode http

tcp-request inspect-delay 7s
tcp-request content track-sc1 src table backend_sourceip

tcp-request content reject if { 
sc1_http_req_rate(backend_sourceip) gt 15 }


http-request deny if { sc1_http_req_rate(backend_sourceip) gt 15 
}



peers foo_peers
peer host1 172.16.0.128:8024
peer host2 172.16.0.16:8024

backend backend_sourceip
# 1mio IPs, 8hrs TTL per entry for several stats per IP in 10s
stick-table type ip size 1m expire 8h store 
gpc0,conn_cnt,conn_cur,conn_rate(10s),http_req_cnt,http_req_rate(10s),http_err_cnt,http_err_rate(10s) 
peers foo_peers



I then have 4 terminals, two for doing a:
watch "echo 'show table backend_sourceip' | socat stdio 
/var/run/haproxy.stat"


and two for doing some "curl -Lvs http://127.0.0.1:8080; by hand.
If you do some on the first and some on the second host you'll notice 
different values on one side. Also the counter may e.g. double while the 
other side has the correct/actual value. This results into several 
thousands of requests on our prod. systems but according to the logs it 
can't be correct.
Does anybody else have similar weirdness or can you guys confirm false 
values?
The *_cnt values seem to be ok but the *_rate ones seem to be false in 
some cases.


--
Regards,
Christian Ruppert



capturing samples / evaluating conditionals

2016-03-22 Thread Christian Ruppert

Hi,

I'm trying to setup a parallel RSA/ECC setup as described here:
http://blog.haproxy.com/2015/07/15/serving-ecc-and-rsa-certificates-on-same-ip-with-haproxy/
but in my case the sample was never captured and thus the ECC backend 
has never

been used until I added something else that depends on a sample, like:

acl foo req_ssl_ver lt 3
tcp-request content reject if foo

So I thought instead of adding something that triggers the sample 
capture I

could use something like this:

...
tcp-request inspect-delay 4s
acl HAS_ECC req.ssl_ec_ext eq 1
tcp-request content reject if !HAS_ECC
use_backend ssl-ecc if HAS_ECC
...

That works so far but for some reason the smp_fetch_req_ssl_ec_ext() is 
called
twice. On the first call the sample buffer is empty again but on the 
second it's

filled with the actual capture and it seems to work.

So my question(s) now:
1. Is/was it really intentional to not evaluate if there's just 
something like

"use_backend somebackend if { ... }"

2. Why is the function called twice? That's only when using the ACL 
variant.

Using the workaround with req_ssl_ver above just calls it once.


--
Regards,
Christian Ruppert



Re: General SSL vs. non-SSL Performance

2016-03-19 Thread Christian Ruppert

Hi Cyril,

On 2016-03-16 16:14, Cyril Bonté wrote:

Hi all,

replying really quickly from a webmail, sorry for the lack of details


[...]
I also ran 2 parallel "ab" on two separate machines against a third
one.
The requests per second were around ~70 r/s per host instead of ~140.
So
I doubt it's a entropy problem.


The issue is in your haproxy configuration : you disabled HTTP
keep-alive by using "option httpclose", so you are benchmarking SSL
handshakes and your values are not unusual in that case.
Please try with something else, like "option http-server-close".


The "option httpclose" was on purpose. Also the client could (during a 
attack) simply do the same and achieve the same result. I don't think 
that will help in such cases.


--
Regards,
Christian Ruppert



Re: General SSL vs. non-SSL Performance

2016-03-19 Thread Christian Ruppert

Hi Willy,

On 2016-03-17 06:05, Willy Tarreau wrote:

Hi Christian,

On Wed, Mar 16, 2016 at 05:25:53PM +0100, Christian Ruppert wrote:

Hi Lukas,

On 2016-03-16 16:53, Lukas Tribus wrote:
>>The "option httpclose" was on purpose. Also the client could (during a
>>attack) simply do the same and achieve the same result. I don't think
>>that will help in such cases.
>
>So what you are actually and purposely benchmarking are SSL/TLS
>handshakes, because thats the bottleneck you are trying to improve.

You're right, yes.


You also found the hard way why it's important to share TLS secrets
between multiple front nodes, or to properly distribute the load to
avoid handshakes as much as possible.


I also just stumbled over this:
https://software.intel.com/en-us/articles/accelerating-ssl-load-balancers-with-intel-xeon-v3-processors
Might be interesting for others as well. So ECC and 
multi-threaded/process is the way to go it seems.





>Both nginx [1] and haproxy currently do not support offloading TLS
>handshakes to another thread or dedicating a thread to a TLS session.
>
>Thats why Apache will scale better currently, because its threading.

Hm, I haven't tried Apache yet but would that be a huge benefit 
compared to

a setup using nbproc > 1?


Here I don't know. TLS handshakes are one large part of what made me 
think
that we must go multi-threaded instead of multi-process over the long 
term,
just because I want to be able to pin some tasks to some CPUs. Ie when 
TLS
says "handshake needed", we want to be able to migrate the task to 
another
CPU to avoid the huge latency imposed to all other processing (eg: 7ms 
in

your case).

But note that people who have to deal with heavy SSL traffic actually
deal with this in haproxy by using to levels of processing, one for
HTTP and one for TLS. It means that only TLS traffic can be hurt by
handshakes :

   listen secure
bind :443 ssl crt foo.pem process 2-32
mode tcp
server clear 127.0.0.1:80

   frontend clear
bind :80 process 1
mode http
use_backend my_wonderful_server_farm

   ...



Your example would be better and easier but we need the client IP for 
ACLs and so forth which wouldn't work in tcp mode and there would be no 
XFF header. So we're duplicating stuff in the frontend but use one 
backend.



And before linux kernel reintroduced support for SO_REUSEPORT (in
3.9), it was common to have the single process load-balance incoming
TCP connections to all other TLS processes. It then makes it possible
to chose the LB algo you want, including source hash so that a same
attacker can only affect one process for example.

Willy


--
Regards,
Christian Ruppert



Re: General SSL vs. non-SSL Performance

2016-03-19 Thread Christian Ruppert

Hi Aleks,

On 2016-03-16 15:57, Aleksandar Lazic wrote:

Hi.

Am 16-03-2016 15:17, schrieb Christian Ruppert:

Hi,

this is rather HAProxy unrelated so more a general problem but 
anyway..
I did some tests with SSL vs. non-SSL performance and I wanted to 
share my

results with you guys but also trying to solve the actual problem

So here is what I did:


[snipp]


A test without SSL, using "ab":
# ab -k -n 5000 -c 250 http://127.0.0.1:65410/


[snipp]

That's much worse than I expected it to be. ~144 requests per second 
instead of
42*k*. That's more than 99% performance drop. The cipher a moderate 
but secure

(for now), I doubt that changing the cipher will help a lot here.
nginx and HAProxy
performance is almost equal so it's not a problem with the server 
software.
One could increase nbproc (at least in my case it only increased up to 
nbproc 4,
Xeon E3-1281 v3) but that's just a rather minor enhancement. With 
those ~144 r/s

you're basically lost when being under attack. How did you guys solve
this problem?
External SSL offloading, using hardware crypto foo, special
cipher/settings tuning,
simply *much* more hardware or not yet at all?


You run both client & server on the same machine

Maybe you are running out of entropy?
Are you able to run the client on a different machine?

BR Aleks


I also ran 2 parallel "ab" on two separate machines against a third one. 
The requests per second were around ~70 r/s per host instead of ~140. So 
I doubt it's a entropy problem.


--
Regards,
Christian Ruppert



Re: General SSL vs. non-SSL Performance

2016-03-19 Thread Christian Ruppert

On 2016-03-18 11:31, Christian Ruppert wrote:

Hi Willy,

On 2016-03-17 06:05, Willy Tarreau wrote:

Hi Christian,

On Wed, Mar 16, 2016 at 05:25:53PM +0100, Christian Ruppert wrote:

Hi Lukas,

On 2016-03-16 16:53, Lukas Tribus wrote:
>>The "option httpclose" was on purpose. Also the client could (during a
>>attack) simply do the same and achieve the same result. I don't think
>>that will help in such cases.
>
>So what you are actually and purposely benchmarking are SSL/TLS
>handshakes, because thats the bottleneck you are trying to improve.

You're right, yes.


You also found the hard way why it's important to share TLS secrets
between multiple front nodes, or to properly distribute the load to
avoid handshakes as much as possible.


I also just stumbled over this:
https://software.intel.com/en-us/articles/accelerating-ssl-load-balancers-with-intel-xeon-v3-processors
Might be interesting for others as well. So ECC and
multi-threaded/process is the way to go it seems.




>Both nginx [1] and haproxy currently do not support offloading TLS
>handshakes to another thread or dedicating a thread to a TLS session.
>
>Thats why Apache will scale better currently, because its threading.

Hm, I haven't tried Apache yet but would that be a huge benefit 
compared to

a setup using nbproc > 1?


Here I don't know. TLS handshakes are one large part of what made me 
think
that we must go multi-threaded instead of multi-process over the long 
term,
just because I want to be able to pin some tasks to some CPUs. Ie when 
TLS
says "handshake needed", we want to be able to migrate the task to 
another
CPU to avoid the huge latency imposed to all other processing (eg: 7ms 
in

your case).

But note that people who have to deal with heavy SSL traffic actually
deal with this in haproxy by using to levels of processing, one for
HTTP and one for TLS. It means that only TLS traffic can be hurt by
handshakes :

   listen secure
bind :443 ssl crt foo.pem process 2-32
mode tcp
server clear 127.0.0.1:80

   frontend clear
bind :80 process 1
mode http
use_backend my_wonderful_server_farm

   ...



Your example would be better and easier but we need the client IP for
ACLs and so forth which wouldn't work in tcp mode and there would be
no XFF header. So we're duplicating stuff in the frontend but use one
backend.



Hm, not sure how that would perform with "server ... send-proxy[-v2]" in 
the listen block and "bind :anotherport accept-proxy" in the frontend 
block, additionally.
Duplication a lot of ACLs and so forth or using your example 
(simplified) with PROXY protocol.



And before linux kernel reintroduced support for SO_REUSEPORT (in
3.9), it was common to have the single process load-balance incoming
TCP connections to all other TLS processes. It then makes it possible
to chose the LB algo you want, including source hash so that a same
attacker can only affect one process for example.

Willy


--
Regards,
Christian Ruppert



Re: General SSL vs. non-SSL Performance

2016-03-19 Thread Christian Ruppert

On 2016-03-17 00:14, Nenad Merdanovic wrote:

Hello,

On 3/16/2016 6:25 PM, Christian Ruppert wrote:


Some customers may require 4096 bit keys as it seems to be much more
decent than 2048 nowadays. So you may be limited here. A test with a
2048 bit Cert gives me around ~770 requests per second, a test with an
256 bit ECC cert around 1600 requests per second. That's still more 
than
96% difference compared to non-SSL, way better than the 4096 bit RSA 
one

though. I also have to make sure that even some older clients can
connect to the site, so I have to take a closer look on the ECC certs
and cipher then. ECC is definitively an enhancement, if there's no
compatibility problem.


HAproxy can, in latest versions, serve both ECC and RSA certificates
depending on client support. In a fairly large environment I have found
that about 85% of clients are ECC capable. Also, look at configuring 
TLS

ticket keys and rotating them properly as well as using keepalive.

The difference in performance you are observing is fairly normal. You
can measure the SSL performance of your CPU using 'openssl speed' to 
see
how many computes/s you get without the HAproxy penalty, but the 
numbers

should be very close.

Another thing you might consider is switching to OpenSSL 1.0.2 because
you have a v3 Intel Xeon which has AVX2 instruction support and will
benefit from improvements done in 1.0.2.



That's indeed a noticeable performance increase on RSA but I couldn't 
notice any difference for ECC.



In an SSL-heavy environment, we use servers with a lot of cores, albeit
slower per core, and with a good DDoS-protection ruleset haven't
experienced any attacks that weren't effectively mitigated.

With a properly configured SSL stack in HAproxy (all of the things
mentioned above), the CPU usage difference is almost negligible. And to
be honest, there are not that many SSL-exhaustion attacks.



For now perhaps, but more and more sites/customer want 100% https 
whether it's just cool or indeed useful doesn't matter. And I am 
somewhat scared if one can take down the site with very few requests 
just by disabling keep-alive and other features on the client side.







Both nginx [1] and haproxy currently do not support offloading TLS
handshakes to another thread or dedicating a thread to a TLS session.

Thats why Apache will scale better currently, because its threading.


Hm, I haven't tried Apache yet but would that be a huge benefit 
compared

to a setup using nbproc > 1?


No :) Your CPU can only give as much.

Regards,
Nenad


--
Regards,
Christian Ruppert



RE: General SSL vs. non-SSL Performance

2016-03-19 Thread Christian Ruppert

On 2016-03-16 17:56, Lukas Tribus wrote:

Some customers may require 4096 bit keys as it seems to be much more
decent than 2048 nowadays.


I've not come across any recommendations pointing in that direction, in
fact 2048-bit RSA are supposed to be safe for commercial use until 
2030.


I don't think this is a real requirement from knowledgeable people, to
be frank.


That's almost always the case when talking about requirements.



In any case it doesn't make any sense because if your customer really 
has

such huge requirements you may as well switch to ECC because you won't
be able to support old clients anyway.



I just compared the RSA one against ECC on ssllabs and it seems there's 
no difference on the browser/device compatibility topic. So we should 
indeed consider ECC keys.






That's still more than 96% difference compared to non-SSL


Well your are basically benchmarking your stack with a TLS specific
denial of service attack. Of course the same attack without TLS won't
have noticable effect on the stack. So that number is quite obviously
high.



Yeah but to me it looks like almost anybody else will be affected as 
well when migrating to 100% https. A few hosts could easily take down 
the site when disabling keep-alive and so on on the client while doing 
some "valid" requests. So it's hard to noticed compared to http only, 
because they can use much less requests, connections etc.






Thats why Apache will scale better currently, because its threading.


Hm, I haven't tried Apache yet but would that be a huge benefit 
compared

to a setup using nbproc> 1?


I haven't tried it either, but yes, I would assume so. It also doesn't 
block

other connections will handshaking new ones.




Regards,

Lukas


--
Regards,
Christian Ruppert



General SSL vs. non-SSL Performance

2016-03-19 Thread Christian Ruppert
0
Write errors:   0
Keep-Alive requests:0
Total transferred:  46 bytes
HTML transferred:   55000 bytes
Requests per second:144.14 [#/sec] (mean)
Time per request:   1734.425 [ms] (mean)
Time per request:   6.938 [ms] (mean, across all concurrent 
requests)

Transfer rate:  12.95 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:  326 1057 236.0   10421709
Processing:35  658 210.96601013
Waiting:   35  658 211.16591012
Total:   1264 1716 109.3   17022651

Percentage of the requests served within a certain time (ms)
  50%   1702
  66%   1708
  75%   1712
  80%   1714
  90%   1720
  95%   1779
  98%   2158
  99%   2211
 100%   2651 (longest request)






That's much worse than I expected it to be. ~144 requests per second 
instead of
42*k*. That's more than 99% performance drop. The cipher a moderate but 
secure
(for now), I doubt that changing the cipher will help a lot here. nginx 
and HAProxy
performance is almost equal so it's not a problem with the server 
software.
One could increase nbproc (at least in my case it only increased up to 
nbproc 4,
Xeon E3-1281 v3) but that's just a rather minor enhancement. With those 
~144 r/s
you're basically lost when being under attack. How did you guys solve 
this problem?
External SSL offloading, using hardware crypto foo, special 
cipher/settings tuning,

simply *much* more hardware or not yet at all?

--
Regards,
Christian Ruppert



RE: General SSL vs. non-SSL Performance

2016-03-18 Thread Christian Ruppert

Hi Lukas,

On 2016-03-16 16:53, Lukas Tribus wrote:

The "option httpclose" was on purpose. Also the client could (during a
attack) simply do the same and achieve the same result. I don't think
that will help in such cases.


So what you are actually and purposely benchmarking are SSL/TLS
handshakes, because thats the bottleneck you are trying to improve.


You're right, yes.



First of all the selected cipher is very important, as is the 
certificate

and the RSA key size.

For optimal performance, you would drop your RSA certificate
and get a ECC cert. If thats not a possibility then use 2048-bit
RSA certificates.


Your ab output suggest that the negotiated cipher is
ECDHE-RSA-AES128-GCM-SHA256 - which is fine for RSA certificates,
but your RSA certificate is 4096 bit long, which is where the 
performance

penalty comes from - use 2048bit certificates or better yet use ECC
certificates.

read: DO NOT USE RSA certificates longer than 2048bit.


Some customers may require 4096 bit keys as it seems to be much more 
decent than 2048 nowadays. So you may be limited here. A test with a 
2048 bit Cert gives me around ~770 requests per second, a test with an 
256 bit ECC cert around 1600 requests per second. That's still more than 
96% difference compared to non-SSL, way better than the 4096 bit RSA one 
though. I also have to make sure that even some older clients can 
connect to the site, so I have to take a closer look on the ECC certs 
and cipher then. ECC is definitively an enhancement, if there's no 
compatibility problem.





Both nginx [1] and haproxy currently do not support offloading TLS
handshakes to another thread or dedicating a thread to a TLS session.

Thats why Apache will scale better currently, because its threading.


Hm, I haven't tried Apache yet but would that be a huge benefit compared 
to a setup using nbproc > 1?






Hope this helps,

Lukas



[1] https://twitter.com/ngx_vbart/status/611956593324916736


--
Regards,
Christian Ruppert



Custom SSL DHparams prime

2015-05-21 Thread Christian Ruppert

Hi,

from what I've seen in the sources and documentation a default and 
pre-generated prime will be used as default (unless appended to the 
certificate). HAProxy uses the related functions provided by OpenSSL 
itself (get_rfc3526_prime_2048, ...). What I miss here is an option to 
specify my own dhparams file to avoid using those pre-generated ones 
and/ore appending some to all certificates. Wouldn't it make sense to 
allow it to be read from a file, globally?


--
Regards,
Christian Ruppert



Re: Custom SSL DHparams prime

2015-05-21 Thread Christian Ruppert

On 2015-05-21 18:20, Remi Gacogne wrote:

Hi,


from what I've seen in the sources and documentation a default and
pre-generated prime will be used as default (unless appended to the
certificate). HAProxy uses the related functions provided by OpenSSL
itself (get_rfc3526_prime_2048, ...).  What I miss here is an option 
to

specify my own dhparams file to avoid using those pre-generated ones
and/ore appending some to all certificates. Wouldn't it make sense to
allow it to be read from a file, globally?


I don't think the 2048-bit MODP group 14 used by Haproxy is at risk
right now, still it can't hurt to use a large number of different 
groups.

You can use your own dhparam by appending it to the file specified with
the crt command, after your certificate chain and key.


Well, I meant globally, as default.

global
tune.ssl.default-dh-param /path/to/custom/dhparams.pem

2048 was just an example. There is 1024 and IIRC 768 as well. One might 
be forced to use 1024.
Also, according to the documentation HAProxy wouldn't allow/use anything 
greater than tune.ssl.default-dh-param which is 1024 by default - unless 
I misunderstood something.


--
Regards,
Christian Ruppert



Re: base32+src

2015-02-13 Thread Christian Ruppert

Hi Yuan,

On 2015-02-12 17:39, Yuan wrote:

Hello Experts,

Our customer’s website has just been brought down by bots.bots
website aware.

base32+src can look at src + url.

I am not good at this. I am hoping I can get some help to create the
needed config. Can I do the below config ;

_# Begin DDOS-Protection-Config_
_# Monitor the number of request sent by an IP over a period of 10
seconds_
_ stick-table type base32+src size 1m expire 10s store
gpc0,http_req_rate(10s)_
_ tcp-request connection track-sc1 src_
_ # Refuses a new connection from an abuser_
_ tcp-request content reject if { src_get_gpc0 gt 0 }_
_ # Returns a 403 response for requests in an established connection_
_ http-request deny if { src_get_gpc0 gt 0 }_

I think this config is wrong. Any help or tips or sample config using
base32+src possible. Maybe a Link where someone posted a sample config
using base32+src. I have both port 80  port 443 with port 80 rewrite
to port 443.


Due to lack of of time I can't help you that much but what you miss is 
increasing the gpc0 counter. You should take a look at haproxy rate 
limiting stuff, there are some good examples out there, e.g.:

http://brokenhaze.com/blog/2014/03/25/how-stack-exchange-gets-the-most-out-of-haproxy/

It's also pretty easy to test with a few shells, curl and socat.



I had some help from Willy about using base32+src which I understood
in theory but I am not good enough to convert that wonderful advise to
a workable config.

Best regards,
; Yuan


--
Regards,
Christian Ruppert



Re: Global ACLs

2015-02-03 Thread Christian Ruppert

Hi Willy,

On 2015-02-02 17:31, Willy Tarreau wrote:

Hi Christian,

On Mon, Feb 02, 2015 at 04:55:56PM +0100, Christian Ruppert wrote:

Hey,

are there some kind of global ACLs perhaps? I think that could be 
really

useful. In my case I have ~70 frontends and ~100 backends. I often use
the same ACLs on multiple frontends/backends for specific whitelisting
etc.
It would be extremely helpful to specify some of those ACLs in the
global scope and use it where needed without having to re-define it
again and again.
Technically that shouldn't be much different from what it does in the
local scope, shouldn't it?
So I guess the ACL is prepare once on startup, it shouldn't matter 
where

that is done. Using it so actually evaluating it is always (as before)
done in the local scope, depending on the actual Layer etc.

So adding support for global ACLs should be easy and helpful, or am I
wrong? Did I forgot something important here?

Example:

global
acl foo src 192.168.1.1
acl foobar hdr_ip(X-Forwarded-For,-1) 192.168.1.2 # This *might* 
be

a special case... Not yet further verified.


frontend example

use_backend ... if foo
use_backend ... if foobar



We've been considering this for a while now without any elegant 
solution.
Recently while discussing with Emeric we got an idea to implement 
scopes,

and along these lines I think we could instead try to inherit ACLs from
other frontends/backends/defaults sections. Currently defaults sections
support having a name, though this name is not internally used, admins
often put some notes there such as tcp or a customer's id.


That would be perfect, even better than just global.
One could use the same ACL names but in different scopes, i.e. different 
layer.




Here we could have something like this :

defaults foo
acl local src 127.0.0.1

frontend bar
acl client src 192.168.0.0/24
use_backend c1 if client
use_backend c2 if foo/local

It would also bring the extra benefit of allowing complex shared 
configs
to use their own global ACLs regardless of what is being used in 
other

sections.

That's just an idea, of course.


Yeah, that sounds pretty decent to me.



Regards,
Willy


--
Regards,
Christian Ruppert



Global ACLs

2015-02-02 Thread Christian Ruppert

Hey,

are there some kind of global ACLs perhaps? I think that could be really 
useful. In my case I have ~70 frontends and ~100 backends. I often use 
the same ACLs on multiple frontends/backends for specific whitelisting 
etc.
It would be extremely helpful to specify some of those ACLs in the 
global scope and use it where needed without having to re-define it 
again and again.
Technically that shouldn't be much different from what it does in the 
local scope, shouldn't it?
So I guess the ACL is prepare once on startup, it shouldn't matter where 
that is done. Using it so actually evaluating it is always (as before) 
done in the local scope, depending on the actual Layer etc.


So adding support for global ACLs should be easy and helpful, or am I 
wrong? Did I forgot something important here?


Example:

global
acl foo src 192.168.1.1
acl foobar hdr_ip(X-Forwarded-For,-1) 192.168.1.2 # This *might* be 
a special case... Not yet further verified.



frontend example

use_backend ... if foo
use_backend ... if foobar



--
Regards,
Christian Ruppert



No TCP RST on tcp-request connection reject

2015-01-14 Thread Christian Ruppert
Hey guys,

just a thought... wouldn't it make sense to add an option to tcp-request
connection reject to disable the actual TCP RST? So, an attacker tries to
(keep) open a lot of ports:

a) HAProxy (configured with rate limiting etc.) does a tcp-request connection
reject which ends up as a TCP RST. The attacker gets the RST and immediately 
again
b) the same as a) but the socket will be closed on the server side but no RST,
nothing will be sent back to the remote side. The connections on the remote side
will be kept open until timeout.

Wouldn't it make sense to implement an option for b) so it can be used during
major attacks or so?



Re: No TCP RST on tcp-request connection reject

2015-01-14 Thread Christian Ruppert
Hi Baptiste,

tarpit is pretty handy but as far as I understood it will keep the connection
open, on both sides. So at some point (pretty quickly actually) we cannot handle
any more connections on that host. The host will become slow and/or
unresponsive. When we close the connection on our local side but don't notify
the remote side it will probably exhaust the attacker and we could handle more
connections and/or free and re-use such connections that has been classified too
much.

On 01/14/2015 05:28 PM, Baptiste wrote:
 On Wed, Jan 14, 2015 at 5:00 PM, Christian Ruppert c.rupp...@babiel.com 
 wrote:
 Hey guys,

 just a thought... wouldn't it make sense to add an option to tcp-request
 connection reject to disable the actual TCP RST? So, an attacker tries to
 (keep) open a lot of ports:

 a) HAProxy (configured with rate limiting etc.) does a tcp-request 
 connection
 reject which ends up as a TCP RST. The attacker gets the RST and 
 immediately again
 b) the same as a) but the socket will be closed on the server side but no 
 RST,
 nothing will be sent back to the remote side. The connections on the remote 
 side
 will be kept open until timeout.

 Wouldn't it make sense to implement an option for b) so it can be used during
 major attacks or so?

 
 Hi Christian,
 
 Have you had a look at tarpit related options from HAProxy?
 You can slowdown the attack thanks to it.
 
 Baptiste
 

-- 
Mit freundlichen Grüßen,
Christian Ruppert
Systemadministrator

..

Babiel GmbH
Erkrather Str. 224 a
D-40233 Düsseldorf

Tel: 0211-179349 0
Fax: 0211-179349 29
c.rupp...@babiel.com

http://www.babiel.com

GESCHÄFTSFÜHRER
Georg Babiel, Dr. Rainer Babiel, Harald Babiel
Amtsgericht Düsseldorf HRB 38633

DISCLAIMER
The information transmitted in this electronic mail message may contain
confidential and or privileged materials. Any review, retransmission,
dissemination or other use of or taking of any action in reliance upon, this
information by persons or entities other than the intended recipient is
prohibited. If you receive such e-mails in error, please contact the sender and
delete the material from any computer.



Re: how to update config w/o stopping the haproxy service

2013-04-28 Thread Christian Ruppert
On 04/28/13 at 01:00PM -0400, S Ahmed wrote:
 Hi,
 
 1.  Is there a way to update the config file without having to stop/start
 the haproxy service? e.g. when I need to update the ip addresses of the
 backend servers (using ec2)
 
 2. During migrations, say I have 10 backend servers, what if I want to stop
 taking requests for 5 of the 10 servers, is the best way to update the
 config and just remove them?  Or is there a smoother transition somehow
 that won't causes errors during the transition?
 i.e. would it be possible to finish the requests, but stop responding to
 new requests for those 5 servers I want to take offline.

See https://code.google.com/p/haproxy-docs/wiki/disabled

You can restart HAProxy by e.g.:-D -p /var/run/haproxy.pid -f /etc/haproxy.cfg 
-sf $(cat
/var/run/haproxy.pid)
Alternatively you could use the control socket by using socat:
https://code.google.com/p/haproxy-docs/wiki/UnixSocketCommands

So e.g. disable server backend1/server1
Or even via the stats interface with stats admin if 

-- 
Regards,
Christian Ruppert


pgpOHYOzE_D16.pgp
Description: PGP signature


AW: use_backend: brackets/grouping not accepted in condition

2013-03-22 Thread Christian Ruppert
Hi Baptiste,

it is IMHO not really clear that brackets are for anonymous ACLs only.
Wouldn't it make sense to support it for use_backend as well?
It just makes things easier in my opinion.

Mit freundlichen Grüßen,
Christian Ruppert



Christian Ruppert
Systemadministrator

Babiel GmbH
Erkrather Str. 224 a
D-40233 Düsseldorf

Tel: 0211-179349 0
Fax: 0211-179349 29
E-Mail: c.rupp...@babiel.com
Internet: http://www.babiel.com

Geschäftsführer: Georg Babiel, Dr. Rainer Babiel, Harald Babiel Amtsgericht 
Düsseldorf HRB 38633

~~ DISCLAIMER ~~~

The information transmitted in this electronic mail message may contain 
confidential and or privileged materials. Any review, retransmission, 
dissemination or other use of or taking of any action in reliance upon, this 
information by persons or entities other than the intended recipient is 
prohibited. If you receive such e-mails in error, please contact the sender and 
delete the material from any computer.


 -Ursprüngliche Nachricht-
 Von: Baptiste [mailto:bed...@gmail.com]
 Gesendet: Donnerstag, 21. März 2013 20:00
 An: Christian Ruppert
 Cc: haproxy@formilux.org
 Betreff: Re: use_backend: brackets/grouping not accepted in condition
 
 Hi Christian,
 
 Brackets are for anonymous ACLs only.
 You seem to use named ACLs with brackets so it can't work.
 
 Either you do as you said:
  use_backend backend_test if request_domain1 allowed_ip_foo or
 request_domain1 allowed_ip_bar
 
 Or with 2 use_backend:
  use_backend backend_test if request_domain1 allowed_ip_foo
  use_backend backend_test if request_domain1 allowed_ip_bar
 
 Baptiste
 
 
 
 On Thu, Mar 21, 2013 at 6:25 PM, Christian Ruppert c.rupp...@babiel.com
 wrote:
  Hi Guys,
 
  I just tried to simplify some rules and I noticed that brackets {} doesn't 
  work
 with use_backend  while it works fine with default_backend.
 
  That doesn't work:
  use_backend backend_test if request_domain1 { allowed_ip_foo or
 allowed_ip_bar }
 
  That works:
  use_backend backend_test if request_domain1 allowed_ip_foo or
 request_domain1 allowed_ip_bar
 
  That works as well:
  default_backend backend_main if request_domain2 { allowed_ip_foo or
 allowed_ip_bar }
 
  I could also use multiple use_backend's but using brackets would make it a
 lot easier and better readable IMHO.
 
  https://code.google.com/p/haproxy-docs/wiki/UsingACLs
  That also sounds like the brackets should work almost everywhere.
 
  Some actions are only performed upon a valid condition. A condition is a
  combination of ACLs with operators. 3 operators are supported :
 
- AND (implicit)
- OR  (explicit with the or keyword or the || operator)
- Negation with the exclamation mark (!)
 
  A condition is formed as a disjunctive form:
 
 [!]acl1 [!]acl2 ... [!]acln  { or [!]acl1 [!]acl2 ... [!]acln } ...
 
  Such conditions are generally used after an if or unless statement,
  indicating when the condition will trigger the action.
 
  I would really like to see that fixed. Or is that on purpose?
 
  Mit freundlichen Grüßen,
  Christian Ruppert
 
  
 
  Christian Ruppert
  Systemadministrator
 
  Babiel GmbH
  Erkrather Str. 224 a
  D-40233 Düsseldorf
 
  Tel: 0211-179349 0
  Fax: 0211-179349 29
  E-Mail: c.rupp...@babiel.com
  Internet: http://www.babiel.com
 
  Geschäftsführer: Georg Babiel, Dr. Rainer Babiel, Harald Babiel Amtsgericht
 Düsseldorf HRB 38633
 
  ~~ DISCLAIMER ~~~
 
  The information transmitted in this electronic mail message may contain
 confidential and or privileged materials. Any review, retransmission,
 dissemination or other use of or taking of any action in reliance upon, this
 information by persons or entities other than the intended recipient is
 prohibited. If you receive such e-mails in error, please contact the sender 
 and
 delete the material from any computer.


AW: use_backend: brackets/grouping not accepted in condition

2013-03-22 Thread Christian Ruppert
Hi Bryan,

 

I am somewhat confused now..

So it sounds like the behavior of the brackets in combination with 
default_backend is wrong since it seems to work fine there even with IP ACLs.

 

And what I meant is, wouldn’t it make sense to support e.g. IP ACLs with either 
{} or () or whatever else to allow one to group the rules instead of writing 
multiple use_backend lines?

For small stuff, like in my example, it would make it slightly “easier”.

 

use_backend if somecondition (foo or bar)

vs.

use_backend if someconditoon foo

use_backend if someconditoon bar

 

Mit freundlichen Grüßen,

Christian Ruppert

 



 

Christian Ruppert

Systemadministrator

 

Babiel GmbH

Erkrather Str. 224 a

D-40233 Düsseldorf

 

Tel: 0211-179349 0

Fax: 0211-179349 29

E-Mail: c.rupp...@babiel.com

Internet: http://www.babiel.com http://www.babiel.com/ 

 

Geschäftsführer: Georg Babiel, Dr. Rainer Babiel, Harald Babiel Amtsgericht 
Düsseldorf HRB 38633

 

~~ DISCLAIMER ~~~

 

The information transmitted in this electronic mail message may contain 
confidential and or privileged materials. Any review, retransmission, 
dissemination or other use of or taking of any action in reliance upon, this 
information by persons or entities other than the intended recipient is 
prohibited. If you receive such e-mails in error, please contact the sender and 
delete the material from any computer.

 

Von: Bryan Talbot [mailto:btal...@aeriagames.com] 
Gesendet: Freitag, 22. März 2013 16:35
An: Christian Ruppert
Cc: Baptiste; HAproxy Mailing Lists
Betreff: Re: use_backend: brackets/grouping not accepted in condition

 

On Fri, Mar 22, 2013 at 2:47 AM, Christian Ruppert c.rupp...@babiel.com wrote:

Hi Baptiste,

it is IMHO not really clear that brackets are for anonymous ACLs only.
Wouldn't it make sense to support it for use_backend as well?

 

 

Those two are not mutually exclusive: you can use them with use_backend and 
they are for anonymous acls.

 

for example: 

  use_backend www if METH_POST or {path_beg /static /images /img /css}

 

-Bryan

 



use_backend: brackets/grouping not accepted in condition

2013-03-21 Thread Christian Ruppert
Hi Guys,

I just tried to simplify some rules and I noticed that brackets {} doesn't work 
with use_backend  while it works fine with default_backend.

That doesn't work:
use_backend backend_test if request_domain1 { allowed_ip_foo or allowed_ip_bar }

That works:
use_backend backend_test if request_domain1 allowed_ip_foo or request_domain1 
allowed_ip_bar

That works as well:
default_backend backend_main if request_domain2 { allowed_ip_foo or 
allowed_ip_bar }

I could also use multiple use_backend's but using brackets would make it a lot 
easier and better readable IMHO.

https://code.google.com/p/haproxy-docs/wiki/UsingACLs
That also sounds like the brackets should work almost everywhere.

Some actions are only performed upon a valid condition. A condition is a
combination of ACLs with operators. 3 operators are supported :

  - AND (implicit)
  - OR  (explicit with the or keyword or the || operator)
  - Negation with the exclamation mark (!)

A condition is formed as a disjunctive form:

   [!]acl1 [!]acl2 ... [!]acln  { or [!]acl1 [!]acl2 ... [!]acln } ...

Such conditions are generally used after an if or unless statement,
indicating when the condition will trigger the action.

I would really like to see that fixed. Or is that on purpose?

Mit freundlichen Grüßen,
Christian Ruppert



Christian Ruppert
Systemadministrator

Babiel GmbH
Erkrather Str. 224 a
D-40233 Düsseldorf

Tel: 0211-179349 0
Fax: 0211-179349 29
E-Mail: c.rupp...@babiel.com
Internet: http://www.babiel.com

Geschäftsführer: Georg Babiel, Dr. Rainer Babiel, Harald Babiel Amtsgericht 
Düsseldorf HRB 38633

~~ DISCLAIMER ~~~

The information transmitted in this electronic mail message may contain 
confidential and or privileged materials. Any review, retransmission, 
dissemination or other use of or taking of any action in reliance upon, this 
information by persons or entities other than the intended recipient is 
prohibited. If you receive such e-mails in error, please contact the sender and 
delete the material from any computer.


IPv6 ACLs for 1.4.x

2012-07-16 Thread Christian Ruppert
Hi guys,

I saw that 1.5.x will have IPv6 ACL support. Would it be possible to backport it
to 1.4.x? :)
I haven't looked at the patch yet though so I don't know how much work it may
be.

-- 
Regards,
Christian Ruppert


pgpScmPJYmYHN.pgp
Description: PGP signature