Re: reg-tests situation in haproxy 1.8

2019-01-23 Thread Willy Tarreau
On Mon, Jan 21, 2019 at 03:28:35PM +0100, Frederic Lecaille wrote:
> On 1/19/19 8:53 AM, Willy Tarreau wrote:
> > I was interested in backporting them to 1.8 once we have more experience
> > with them and they're better organized, so that we avoid backporting
> > reorg patches. I'd say we've made quite some progress now and we could
> > possibly backport them. But I wouldn't be surprised if we'd soon rename
> > many of them again since the relation between the level and the prefix
> > letter has to be looked up into the makefile each time, so probably this
> > is something we should improve.
> 
> Note that a "reg-tests-help" makefile target dump the list of LEVELs:

Oh I'm well aware of this and it's where I look this up every time, it's
just that I can't remember them as there's no mnemotechnic mapping, so I
systematically have to look this up.

> We could set the level with strings:
> 
> h*.vtc -> haproxy
> s*.vtc -> slow
> l*.vtc -> low   (perhaps this one should be removed).
> b*.vtx -> bug
> k*.vtc -> broken
> e*.vtc -> exp
> 
> only list of levels could be permitted:
> 
> $ LEVEL=haproxy,bug make reg-tests ...
> 
> As there is no more level notion here, perhaps we should rename LEVEL
> environment variable to VTC_TYPES, REGTEST_TYPES or something else.

That could be better. Thinking about it further, since run-regtests
already parses the comments at the head of the files to find the
various prerequisites, maybe instead we should get rid of the difference
in the file name and mention the category with a full word as you did
above directly in the files. It would probably be more obvious when
editing these files. The "h*" files could become the default ones
("normal" ? "default" ?) when no category is set, and the other
categories would have to be explicitly mentioned. One benefit is that we
could keep the same naming during initial submission and final merging
when it's related to a bug report or a broken script.

What do you think ?

Willy



Re: haproxy 1.9.2 with boringssl

2019-01-23 Thread Willy Tarreau
On Wed, Jan 23, 2019 at 09:37:46PM +0100, Aleksandar Lazic wrote:
> 
> Am 23.01.2019 um 21:27 schrieb Willy Tarreau:
> > On Wed, Jan 23, 2019 at 09:08:00PM +0100, Aleksandar Lazic wrote:
> >> Should it be possible to have fe with h1 and be server h2(alpn h2), as I
> >> expect this or similar return value when I go thru haproxy?
> > 
> > Yes absolutely. That's even what I'm doing on my tests to try to fix
> > the issues reported by Luke.
> 
> Okay, perfect.
> 
> Would you like to share your config so that I can see what's wrong with my
> config, thanks.

Sure, here's a copy-paste, hoping I don't mess with anything :-)

  defaults
mode http
option http-use-htx
option httplog
log stdout format raw daemon
timeout connect 4s
timeout client 10s
timeout server 10s

  frontend decrypt
bind :4445
bind :4446 proto h2
bind :4443 ssl crt rsa+dh2048.pem npn h2 alpn h2
default_backend trace

  backend trace
stats uri /stat
server s1 127.0.0.1:443 ssl alpn h2 verify none
#server s2 127.0.0.1:80
#server s3 127.0.0.1:80 proto h2

As you can see you just connect to port 4445.

> >> I haven't seen any log option to get the backend request method, I think 
> >> this
> >> should be a feature request ;-).
> > 
> > What do you mean with "backend request method" precisely ?
> 
> As the log is for frontends It would be nice to be able to get this infos
> from below also for the backend to see what was send to the backend server.

But what is sent to the backend is what comes from the frontend. And there
never is any valid reason for rewriting the method. So the method sent to
the backend is *always* what you receive on the fronend.

Cheers,
Willy



Re: haproxy 1.9.2 with boringssl

2019-01-23 Thread Aleksandar Lazic


Am 23.01.2019 um 21:27 schrieb Willy Tarreau:
> On Wed, Jan 23, 2019 at 09:08:00PM +0100, Aleksandar Lazic wrote:
>> Should it be possible to have fe with h1 and be server h2(alpn h2), as I
>> expect this or similar return value when I go thru haproxy?
> 
> Yes absolutely. That's even what I'm doing on my tests to try to fix
> the issues reported by Luke.

Okay, perfect.

Would you like to share your config so that I can see what's wrong with my 
config, thanks.

>> I haven't seen any log option to get the backend request method, I think this
>> should be a feature request ;-).
> 
> What do you mean with "backend request method" precisely ?

As the log is for frontends It would be nice to be able to get this infos from 
below also for the backend to see what was send to the backend server.
The problem what I see is that a tcpdump/tshark does not help to see what's 
transfered on the wire when the backend talks via TLS.

https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#8.2.4

### current variables

  | H | %HM  | HTTP method (ex: POST)| string  |
  | H | %HP  | HTTP request URI without query string (path)  | string  |
  | H | %HQ  | HTTP request URI query string (ex: ?bar=baz)  | string  |
  | H | %HU  | HTTP request URI (ex: /foo?bar=baz)   | string  |
  | H | %HV  | HTTP version (ex: HTTP/1.0)   | string  |

Possible new
  | H | %bM  | Backend HTTP method (ex: POST)| string   
   |
  | H | %bP  | Backend HTTP request URI without query string (path)  | string   
   |
  | H | %bQ  | Backend HTTP request URI query string (ex: ?bar=baz)  | string   
   |
  | H | %bU  | Backend HTTP request URI (ex: /foo?bar=baz)   | string   
   |
  | H | %bV  | Backend HTTP version (ex: HTTP/1.0)   | string   
   |

###

> Willy

Aleks



Re: DDoS protection: ban clients with high HTTP error rates

2019-01-23 Thread Shawn Heisey

On 1/23/2019 8:16 AM, Marco Colli wrote:
1. Based on advanced conditions (e.g. current user) our Rails 
application decides whether to return a normal response (e.g. 2xx) or 
a 429 (Too Many Requests); it can also return other errors, like 401

2. HAProxy bans clients if they produce too many 4xx errors

What do you think about this solution?
Also, is it correct to use HAProxy directly or it is more performant 
to use fail2ban on HAProxy logs?


I'm definitely not an expert.  My opinion is that you should do both.

I haven't set up the protections in haproxy itself, but I know it can be 
done.  That's something I plan to look into when I find some time.


Just a couple of days ago, I set up a fail2ban jail that looks at the 
haproxy log and initiates bans based on what it finds.  It works REALLY 
well.


This is the definition that activates the jail, in a config file I've 
placed in /etc/fail2ban/jail.d:


[haproxy-custom]
enabled = true
findtime = 120
bantime = 3600
logpath  = /var/log/debug-haproxy
maxretry = 20

This is the definition of the filter it uses, in 
/etc/fail2ban/filter.d/haproxy-custom.conf:


[Definition]
_daemon = haproxy
failregex = ^%(__prefix_line)s(?::\d+)?\s+.*

Basically, if there are 20 or more  requests in the log over a 
timespan of two minutes from one source IP, that address gets banned.  
Most of the NOSRV requests in my log are http->https redirects.  All of 
the attacks that I have seen since setting this server up have come in 
as http, and I have haproxy configured to redirect ALL insecure requests 
to https.  I do have a few settings in haproxy that result in some 
connections being denied entirely, which also produces a  log.


It's entirely possible that a web application could be badly written 
such that it triggers this jail accidentally, but I would expect most 
applications to be just fine.


Legitimate traffic can produce the http->https redirects, but it's 
certainly not likely to get 20 of them in two minutes.


I may also implement a similar filter for repeated 404 errors and maybe 
other errors like 400 or 500, to cover attacks on the https frontends 
where the webserver says the path doesn't exist.


The fail2ban package comes with a filter for 401 responses in the 
haproxy logs; I based my regex on that one.


Thanks,
Shawn




Re: haproxy 1.9.2 with boringssl

2019-01-23 Thread Willy Tarreau
On Wed, Jan 23, 2019 at 09:08:00PM +0100, Aleksandar Lazic wrote:
> Should it be possible to have fe with h1 and be server h2(alpn h2), as I
> expect this or similar return value when I go thru haproxy?

Yes absolutely. That's even what I'm doing on my tests to try to fix
the issues reported by Luke.

> I haven't seen any log option to get the backend request method, I think this
> should be a feature request ;-).

What do you mean with "backend request method" precisely ?

Willy



Re: haproxy 1.9.2 with boringssl

2019-01-23 Thread Aleksandar Lazic
Hi Willy.

Am 23.01.2019 um 19:50 schrieb Willy Tarreau:
> Hi Aleks,
> 
> On Wed, Jan 23, 2019 at 06:58:25PM +0100, Aleksandar Lazic wrote:
>> backend be_generic_tcp
>>   mode http
>>   balance source
>>   timeout check 5s
>>   option tcp-check
>>
>>   server "${SERVICE_NAME}" ${SERVICE_DEST_IP}:${SERVICE_DEST_PORT} check 
>> inter 5s proto h2 ssl ssl-min-ver TLSv1.3 verify none
> 
> You need to replace "proto h2" with "alpn h2", so that the application
> protocol is announced to the other host, otherwise it will stick to the
> default, very likely "http/1.1", while haproxy talks h2 there. This can
> explain the 502 when the other side rejected your request.

I have changed it but still no lock.

Should it be possible to have fe with h1 and be server h2(alpn h2), as I expect 
this or similar return value when I go thru haproxy?

I haven't seen any log option to get the backend request method, I think this 
should be a feature request ;-).


curl -vo /dev/null https://mail.google.com:443
*   Trying 172.217.21.229...
* Connected to mail.google.com (172.217.21.229) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* SSL connection using TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*   subject: CN=mail.google.com,O=Google LLC,L=Mountain 
View,ST=California,C=US
*   start date: Dec 19 08:16:00 2018 GMT
*   expire date: Mar 13 08:16:00 2019 GMT
*   common name: mail.google.com
*   issuer: CN=Google Internet Authority G3,O=Google Trust Services,C=US
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: mail.google.com
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Location: /mail/
< Expires: Wed, 23 Jan 2019 20:01:34 GMT
< Date: Wed, 23 Jan 2019 20:01:34 GMT
< Cache-Control: private, max-age=7776000
< Content-Type: text/html; charset=UTF-8
< X-Content-Type-Options: nosniff
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< Server: GSE
< Alt-Svc: clear
< Accept-Ranges: none
< Vary: Accept-Encoding
< Transfer-Encoding: chunked
<
{ [data not shown]
* Connection #0 to host mail.google.com left intact


Config is now this.

###
cat /tmp/haproxy.cfg
# https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#3
global
  # nodaemon

  log stdout format rfc5424 daemon "${LOGLEVEL}"

  stats socket /tmp/sock1 mode 666 level admin
  stats timeout 1h
  tune.ssl.default-dh-param 2048
  ssl-server-verify none

  nbthread "${NUM_THREADS}"


defaults
  log global

# the format is described at
# https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#4

# copied from
# 
https://github.com/haproxytech/haproxy-docker-arm64v8/blob/master/cfg_files/haproxy.cfg
  retries 3
  timeout http-request10s
  timeout queue   1m
  timeout connect 10s
  timeout client  1m
  timeout server  1m
  timeout http-keep-alive 10s
  timeout check   10s
  maxconn 3000

  default-server resolve-prefer ipv4 inter 5s resolvers mydns
  option http-use-htx
  option httplog

  log-format ">>> %ci:%cp [%tr] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS 
%tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r %rt %sslv %sslc"

resolvers mydns
  nameserver dns1 "${DNS_SRV001}":53
  nameserver dns2 "${DNS_SRV002}":53
  resolve_retries   3
  timeout retry 1s
  hold valid   10s

listen stats
bind :"${STATS_PORT}"
mode http
# Health check monitoring uri.
monitor-uri /healthz

# Add your custom health check monitoring failure condition here.
# monitor fail if 
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /
stats auth "${STATS_USER}":"${STATS_PASSWORD}"

frontend public_tcp
  bind :"${SERVICE_TCP_PORT}" alpn h2,http/1.1

  mode http
  log global

  default_backend be_generic_tcp


backend be_generic_tcp
  mode http
  balance source
  timeout check 5s
  option tcp-check

  server "${SERVICE_NAME}" ${SERVICE_DEST_IP}:${SERVICE_DEST_PORT} check inter 
5s alpn h2 ssl ssl-min-ver TLSv1.3 verify none
###

Log of haproxy

<29>1 2019-01-23T20:00:30+00:00 doh-001 haproxy 1 - - Proxy stats started.
<29>1 2019-01-23T20:00:30+00:00 doh-001 haproxy 1 - - Proxy public_tcp started.
<29>1 2019-01-23T20:00:30+00:00 doh-001 haproxy 1 - - Proxy be_generic_tcp 
started.
[WARNING] 022/200030 (1) : be_generic_tcp/google-mail changed its IP from 
172.217.21.229 to 172.217.18.165 by mydns/dns1.
<29>1 2019-01-23T20:00:30+00:00 doh-001 haproxy 1 - - 
be_generic_tcp/google-mail changed its IP from 172.217.21.229 to 172.217.18.165 
by mydns/dns1.

:public_tcp.accept(0006)=000c from [127.0.0.1:54308] ALPN=
:public_tcp.clireq[000c:]: GET / HTTP/1.1
:public_tcp.clihdr[000c:]: user-agent: curl/7.29.0
:public_tcp.clihdr[000c:]: host: 127.0.0.1:8443
:public_tcp.clihdr[000c:]: accept: */*
:be_generic_tcp.srvcls[000c:0021]
:be_g

Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.

2019-01-23 Thread Janusz Dziemidowicz
śr., 23 sty 2019 o 11:53 Janusz Dziemidowicz  napisał(a):
> 1.14.2 is current version in Debian testing. Debian seems reluctant to
> use "mainline" nginx versions (1.15.x) so 1.14.x might end in Debian
> 10. I'll try to file Debian bug report later today.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=920297

-- 
Janusz Dziemidowicz



Re: haproxy 1.9.2 with boringssl

2019-01-23 Thread Willy Tarreau
Hi Aleks,

On Wed, Jan 23, 2019 at 06:58:25PM +0100, Aleksandar Lazic wrote:
> backend be_generic_tcp
>   mode http
>   balance source
>   timeout check 5s
>   option tcp-check
> 
>   server "${SERVICE_NAME}" ${SERVICE_DEST_IP}:${SERVICE_DEST_PORT} check 
> inter 5s proto h2 ssl ssl-min-ver TLSv1.3 verify none

You need to replace "proto h2" with "alpn h2", so that the application
protocol is announced to the other host, otherwise it will stick to the
default, very likely "http/1.1", while haproxy talks h2 there. This can
explain the 502 when the other side rejected your request.

Willy



Re: [PATCH] runtime do-resolve http action

2019-01-23 Thread Willy Tarreau
Hi Baptiste,

On Wed, Jan 23, 2019 at 02:00:58PM +0100, Baptiste wrote:
> Hi Willy,
> 
> Please find attached to this email a set of 4 patches which add a new HTTP
> action that can use a dns resolver section to perform a DNS resolution
> based on the output of a fetch.
> The use case is split DNS situations or with highly dynamic environment
> where servers behind HAProxy are just ephemeral services.

Ah thanks for having rebased them.

I have some comments below, some purely cosmetic, some less :

> diff --git a/include/types/stream.h b/include/types/stream.h
> index 5e854c5..02eacd9 100644
> --- a/include/types/stream.h
> +++ b/include/types/stream.h
> @@ -119,6 +119,7 @@ struct strm_logs {
>  };
>  
>  struct stream {
> + enum obj_type obj_type; /* object type == OBJ_TYPE_STREAM */

Here this drills a 7-bytes hole between obj_type and flags. It would be
better to move this field elsewhere in the struct where there's a hole
already.

>   int flags;  /* some flags describing the stream */
>   unsigned int uniq_id;   /* unique ID used for the traces */
>   enum obj_type *target;  /* target to use for this stream */
> -- 


> From 077ea8af588e0f0ac2ac4070d514e27c6dac57c9 Mon Sep 17 00:00:00 2001
> From: Baptiste Assmann 
> Date: Mon, 21 Jan 2019 08:34:50 +0100
> Subject: [PATCH 4/4] MINOR: action: new 'http-request do-resolve' action
> 
> The 'do-resolve' action is an http-request action which allows to run
> DNS resolution at run time in HAProxy.
> The name to be resolved can be picked up in the request sent by the
> client and the result of the resolution is stored in a variable.
> The time the resolution is being performed, the request is on pause.
> If the resolution can't provide a suitable result, then the variable
> will be empty. It's up to the admin to take decisions based on this
> statement (return 503 to prevent loops).
> 
> Read carefully the documentation concerning this feature, to ensure your
> setup is secure and safe to be used in production.
> ---
>  doc/configuration.txt  |  54 +-
>  include/proto/action.h |   3 +
>  include/proto/dns.h|   2 +
>  include/types/action.h |   8 ++
>  include/types/stream.h |  10 ++
>  src/action.c   |  34 +++
>  src/cfgparse.c |  18 
>  src/dns.c  | 266 
> +
>  src/proto_http.c   |   9 +-
>  src/stream.c   |  11 ++
>  10 files changed, 407 insertions(+), 8 deletions(-)
> 
> diff --git a/doc/configuration.txt b/doc/configuration.txt
> index 2a7efe9..0155274 100644
> --- a/doc/configuration.txt
> +++ b/doc/configuration.txt
> @@ -4064,7 +4064,6 @@ http-check send-state
>  
>See also : "option httpchk", "http-check disable-on-404"
>  
> -
>  http-request  [options...] [ { if | unless }  ]
>Access control for Layer 7 requests
>  
> @@ -4219,6 +4218,59 @@ http-request deny [deny_status ] [ { if | 
> unless }  ]
>those that can be overridden by the "errorfile" directive.
>No further "http-request" rules are evaluated.
>  
> +http-request do-resolve(,,[ipv4,ipv6])  :
> +  This action performs a DNS resolution of the output of  and stores
> +  the result in the variable . It uses the DNS resolvers section
> +  pointed by .
> +  It is possible to choose a resolution preference using the optional
> +  arguments 'ipv4' or 'ipv6'.
> +  When performing the DNS resolution, the client side connection is on
> +  pause waiting till the end of the resolution.
> +  If an IP address can be found, it is stored into . If any kind of
> +  error occurs, then  is not set.

Just to be sure, it is not set or not modified ? I guess the latter, which
is fine.

> diff --git a/include/types/stream.h b/include/types/stream.h
> index 02eacd9..26a5e4a 100644
> --- a/include/types/stream.h
> +++ b/include/types/stream.h
> @@ -179,6 +179,16 @@ struct stream {
>   struct list *current_rule_list; /* this is used to store the 
> current executed rule list. */
>   void *current_rule; /* this is used to store the 
> current rule to be resumed. */
>   struct hlua *hlua;  /* lua runtime context */
> +
> + /* Context */
> + union {
> + struct {
> + struct dns_requester *dns_requester;
> + char *hostname_dn;
> + int hostname_dn_len;
> + struct act_rule *parent;
> + } dns;
> + } ctx;

History has told us that every single time we created a union with a single
field inside hoping to reuse it later, we never reused it. Thus better
directly put the structure and call it "dns_ctx". It will also be clearer
because "ctx" or "context" are unclear here given that a stream *is* a
context already, so you have a generic context in a context. Also, given
that you have a 4-bytes hole after hostname_dn_len, maybe it could make
sense to place your obj_type there 

Re: haproxy 1.9.2 with boringssl

2019-01-23 Thread Aleksandar Lazic
Hi.

After some tricky stuff with centos I switched to debian as base image and was 
now able to build haproxy with boringssl.


/usr/local/sbin/haproxy -vv
HA-Proxy version 1.9.2 2019/01/16 - https://haproxy.org/
Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv 
-Wno-unused-label -Wno-sign-compare -Wno-unused-parameter 
-Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered 
-Wno-missing-field-initializers -Wtype-limits -Wshift-negative-value 
-Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_LINUX_SPLICE=1 USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 
USE_THREAD=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_TFO=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : BoringSSL
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.5
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT 
IP_FREEBIND
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), 
raw-deflate("deflate"), gzip("gzip")
Built with PCRE2 version : 10.22 2016-07-29
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with multi-threading support.

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as  cannot be specified using 'proto' keyword)
  h2 : mode=HTXside=FE|BE
  h2 : mode=HTTP   side=FE
: mode=HTXside=FE|BE
: mode=TCP|HTTP   side=FE|BE

Available filters :
[SPOE] spoe
[COMP] compression
[CACHE] cache
[TRACE] trace


Now I want to try to make the request to mail.google.com with this config and 
runtime.

###
cat /tmp/haproxy.cfg
# https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#3
global
  # nodaemon

  log stdout format rfc5424 daemon "${LOGLEVEL}"

  stats socket /tmp/sock1 mode 666 level admin
  stats timeout 1h
  tune.ssl.default-dh-param 2048
  ssl-server-verify none

  nbthread "${NUM_THREADS}"


defaults
  log global

# the format is described at
# https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#4

# copied from
# 
https://github.com/haproxytech/haproxy-docker-arm64v8/blob/master/cfg_files/haproxy.cfg
  retries 3
  timeout http-request10s
  timeout queue   1m
  timeout connect 10s
  timeout client  1m
  timeout server  1m
  timeout http-keep-alive 10s
  timeout check   10s
  maxconn 3000

  default-server resolve-prefer ipv4 inter 5s resolvers mydns
  option http-use-htx

resolvers mydns
  nameserver dns1 "${DNS_SRV001}":53
  nameserver dns2 "${DNS_SRV002}":53
  resolve_retries   3
  timeout retry 1s
  hold valid   10s

listen stats
bind :"${STATS_PORT}"
mode http
# Health check monitoring uri.
monitor-uri /healthz

# Add your custom health check monitoring failure condition here.
# monitor fail if 
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /
stats auth "${STATS_USER}":"${STATS_PASSWORD}"

frontend public_tcp
  bind :"${SERVICE_TCP_PORT}"

  mode http
  option httplog
  log global

  default_backend be_generic_tcp


backend be_generic_tcp
  mode http
  balance source
  timeout check 5s
  option tcp-check

  server "${SERVICE_NAME}" ${SERVICE_DEST_IP}:${SERVICE_DEST_PORT} check inter 
5s proto h2 ssl ssl-min-ver TLSv1.3 verify none
###

Test with curl
###
curl -v http://127.0.0.1:8443
* About to connect() to 127.0.0.1 port 8443 (#0)
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 8443 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 127.0.0.1:8443
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 502 Bad Gateway
< cache-control: no-cache
< content-type: text/html
<
502 Bad Gateway
The server returned an invalid or incomplete response.

* Closing connection 0
###

 podmain.io instead of docker
podman run --rm -it -e LOGLEVEL=debug -e NUM_THREADS=8 -e DNS_SRV001=1.1.1.1 -e 
DNS_SRV002=8.8.8.8 \
   -e STATS_PORT=7411 -e STATS_USER=test -e STATS_PASSWORD=test -e 
SERVICE_TCP_PORT=8443 \
   -e SERVICE_NAME=google-mail -e SERVICE_DEST_IP=mail.google.com -e 
SERVICE_DEST_PORT=443 \
   -e CONFIG_FILE=/mnt/haproxy.cfg -v /tmp/:/mnt/ -p 8443 --expose 8443 
--net host \
me2digital/haproxy-19-boringssl

using CONFIG_FILE   :/mnt/haproxy.cfg
<29>1 2019-01-23T17:50:45+00:00 doh-001 haproxy 1 - - Proxy stats started.
<29>1 2019-01-23T17:50:4

Re: H2 Server Connection Resets (1.9.2)

2019-01-23 Thread Luke Seelenbinder
Hi Willy,

This is all very good to hear. I'm glad you were able to get to the bottom of 
it all!

Feel free to send along patches if you want me to test before the 1.9.3 
release. I'm more than happy to do so.

Best,
Luke


—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Wednesday, January 23, 2019 6:02 PM, Willy Tarreau  wrote:

> Hi Luke,
> 

> On Wed, Jan 23, 2019 at 10:47:33AM +, Luke Seelenbinder wrote:
> 

> > We were using http-reuse always and experiencing this
> > issue (as well as getting 80+% connection reuse). When I scaled it back to
> > http-reuse safe, the frequency of this issue seemed to be much lower.
> > (Perhaps because the bulk of my testing was with one client and somewhat
> > unscientific?)
> 

> It could be caused by various things. In my tests the client doesn't even
> use keep-alive so haproxy is less aggressive with connection reuse and
> that could explain some differences.
> 

> > > Thus it
> > > definitely is a matter of bad interaction between two streams, or one
> > > stream affecting the connection and hurting the other stream.
> > 

> > My debugging spidery-sense points to the same thing.
> 

> So I have more info now. There are multiple issues which stack up and
> cause this :
> 

> -   the GOAWAY frame indicating the last stream id might be in flight
> while many more streams have been added. This results in batch
> deaths once the limit is met ;
> 

> -   the last stream ID received in the GOAWAY frame was not considered
> when calculating the number of available streams, leading to more
> than acceptable by the server to be created ;
> 

> -   there is an issue with how new streams are attached to idle connections
> making them non-retryable in case of a failure sur as above. I managed
> to fix this but it still requires some testing with other configs ;
> 

> -   another issue affects idle connections, some of them could remain
> in the idle list while they don't have room anymore because they
> are removed only when they deliver the last stream, thus the check
> doesn't support jumps in the number of available streams ; I suspect
> it could be related to the client aborts that cause server aborts,
> just because it allowed some excess streams to be sent to a mux which
> doesn't have room anymore, but I could be wrong ;
> 

> And a less important one : the maximum number of concurrent streams per
> connection is global. In this case it's 100 so it's lower than nginx's
> 128 thus it doesn't cause any issue. But we could run into problems with
> this and I must address this to make it per-connection.
> 

> With all these changes, I managed to run a long test with no more errors
> and only an immediate retry once in a while if nginx announced the GOAWAY
> too late. When we set the limit ourselves, there's not even any retry
> anymore. Thus I'll continue to work on this and we'll slightly delay 1.9.3
> to collect these fixes. From there we'll be able to see if you still have
> problems and iterate.
> 

> 

> > Let me know if you want me to share our config (it's quite complex) with you
> > privately or if there's anything else we can do to assist.
> 

> That's kind but now I don't need it anymore, I have everything needed to
> reproduce the whole issue it seems.
> 

> Thanks,
> Willy



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: H2 Server Connection Resets (1.9.2)

2019-01-23 Thread Willy Tarreau
Hi Luke,

On Wed, Jan 23, 2019 at 10:47:33AM +, Luke Seelenbinder wrote:
> We were using http-reuse always and experiencing this
> issue (as well as getting 80+% connection reuse). When I scaled it back to
> http-reuse safe, the frequency of this issue seemed to be much lower.
> (Perhaps because the bulk of my testing was with one client and somewhat
> unscientific?)

It could be caused by various things. In my tests the client doesn't even
use keep-alive so haproxy is less aggressive with connection reuse and
that could explain some differences.

> > Thus it
> > definitely is a matter of bad interaction between two streams, or one
> > stream affecting the connection and hurting the other stream.
> 
> My debugging spidery-sense points to the same thing.

So I have more info now. There are multiple issues which stack up and
cause this :
  - the GOAWAY frame indicating the last stream id might be in flight
while many more streams have been added. This results in batch
deaths once the limit is met ;

  - the last stream ID received in the GOAWAY frame was not considered
when calculating the number of available streams, leading to more
than acceptable by the server to be created ;

  - there is an issue with how new streams are attached to idle connections
making them non-retryable in case of a failure sur as above. I managed
to fix this but it still requires some testing with other configs ;

  - another issue affects idle connections, some of them could remain
in the idle list while they don't have room anymore because they
are removed only when they deliver the last stream, thus the check
doesn't support jumps in the number of available streams ; I suspect
it could be related to the client aborts that cause server aborts,
just because it allowed some excess streams to be sent to a mux which
doesn't have room anymore, but I could be wrong ;

And a less important one : the maximum number of concurrent streams per
connection is global. In this case it's 100 so it's lower than nginx's
128 thus it doesn't cause any issue. But we could run into problems with
this and I must address this to make it per-connection.

With all these changes, I managed to run a long test with no more errors
and only an immediate retry once in a while if nginx announced the GOAWAY
too late. When we set the limit ourselves, there's not even any retry
anymore. Thus I'll continue to work on this and we'll slightly delay 1.9.3
to collect these fixes. From there we'll be able to see if you still have
problems and iterate.

> Let me know if you want me to share our config (it's quite complex) with you
> privately or if there's anything else we can do to assist.

That's kind but now I don't need it anymore, I have everything needed to
reproduce the whole issue it seems.

Thanks,
Willy



Re: Rate-limit relating to the healthy servers count

2019-01-23 Thread Thomas Hilaire

Hi,

I didn't think about such trick, it works!

Thank a lot!

On 23/01/2019 11:48, Jarno Huuskonen wrote:

Hi,

On Wed, Jan 23, Thomas Hilaire wrote:

Hi,

I want to implement a rate-limit system using the sticky table of
HAProxy. Consider that I have 100 servers, and a limit of 10
requests per server, the ACL would be:

     http-request track-sc0 int(1) table GlobalRequestsTracker
     http-request deny deny_status 429 if {
sc0_http_req_rate(GlobalRequestsTracker),div(100) gt 10 }

Now if I want to make this dynamic depending on the healthy servers
count, I need to replace the hardcoded `100` per the `nbsrv`
converter like this:

     http-request track-sc0 int(1) table GlobalRequestsTracker
     http-request deny deny_status 429 if {
sc0_http_req_rate(GlobalRequestsTracker),div(nbsrv(MyBackend)) gt 10
}

But I'm getting the error:

     error detected while parsing an 'http-request deny' condition :
invalid args in converter 'div' : expects an integer or a variable
name in ACL expression
'sc0_http_req_rate(GlobalRequestsTracker),div(nbsrv(MyBackend))'.

Is there a way to use `nbsrv` as a variable inside the `div` operator?

Untested: does something like this work:

http-request set-var(req.dummy) nbsrv(GlobalRequestsTracker)
http-request deny deny_status 429 if { 
sc0_http_req_rate(GlobalRequestsTracker),div(req.dummy) gt 10 }

-Jarno



DDoS protection: ban clients with high HTTP error rates

2019-01-23 Thread Marco Colli
Hello!

I use HAProxy in front of a web app / service and I would like to add DDoS
protection and rate limiting. The problem is that each part of the
application has different request rates and for some customers we must
accept very hight request rates and burst, while this is not allowed for
unauthenticated users for example. So I was thinking about this solution:

1. Based on advanced conditions (e.g. current user) our Rails application
decides whether to return a normal response (e.g. 2xx) or a 429 (Too Many
Requests); it can also return other errors, like 401
2. HAProxy bans clients if they produce too many 4xx errors

What do you think about this solution?
Also, is it correct to use HAProxy directly or it is more performant to use
fail2ban on HAProxy logs?

This is the HAProxy configuration that I would like to use:

frontend www-frontend
  tcp-request connection reject if { src_http_err_rate(st_abuse) ge 5 }
  http-request track-sc0 src table st_abuse
  ...
  default_backend www-backend

backend www-backend
  ...

backend st_abuse
  stick-table type ipv6 size 1m expire 10s store http_err_rate(10s)



Do you think that the above rules are correct? Am I missing something?
Also, is it correct to mix *tcp*-request and src_*http*_err_rate in the
frontend?
Is it possible to include only the 4xx errors (and not 5xx) in
http_err_rate?


Any suggestion would be greatly appreciated
Thank you
Marco Colli


[PATCH] runtime do-resolve http action

2019-01-23 Thread Baptiste
Hi Willy,

Please find attached to this email a set of 4 patches which add a new HTTP
action that can use a dns resolver section to perform a DNS resolution
based on the output of a fetch.
The use case is split DNS situations or with highly dynamic environment
where servers behind HAProxy are just ephemeral services.

Baptiste
From c3baea8c50a7dcbe4557c4a578fcbd252ffb7c56 Mon Sep 17 00:00:00 2001
From: Baptiste Assmann 
Date: Tue, 30 Jan 2018 08:10:20 +0100
Subject: [PATCH 3/4] MINOR: obj_type: new object type for struct stream

This patch creates a new obj_type for the struct stream in HAProxy.
---
 include/proto/obj_type.h | 13 +
 include/types/obj_type.h |  1 +
 include/types/stream.h   |  1 +
 3 files changed, 15 insertions(+)

diff --git a/include/proto/obj_type.h b/include/proto/obj_type.h
index 47273ca..19865bb 100644
--- a/include/proto/obj_type.h
+++ b/include/proto/obj_type.h
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static inline enum obj_type obj_type(enum obj_type *t)
@@ -158,6 +159,18 @@ static inline struct dns_srvrq *objt_dns_srvrq(enum obj_type *t)
 	return __objt_dns_srvrq(t);
 }
 
+static inline struct stream *__objt_stream(enum obj_type *t)
+{
+	return container_of(t, struct stream, obj_type);
+}
+
+static inline struct stream *objt_stream(enum obj_type *t)
+{
+	if (!t || *t != OBJ_TYPE_STREAM)
+		return NULL;
+	return __objt_stream(t);
+}
+
 static inline void *obj_base_ptr(enum obj_type *t)
 {
 	switch (obj_type(t)) {
diff --git a/include/types/obj_type.h b/include/types/obj_type.h
index e141d69..9410718 100644
--- a/include/types/obj_type.h
+++ b/include/types/obj_type.h
@@ -41,6 +41,7 @@ enum obj_type {
 	OBJ_TYPE_CONN, /* object is a struct connection */
 	OBJ_TYPE_SRVRQ,/* object is a struct dns_srvrq */
 	OBJ_TYPE_CS,   /* object is a struct conn_stream */
+	OBJ_TYPE_STREAM,   /* object is a struct stream */
 	OBJ_TYPE_ENTRIES   /* last one : number of entries */
 } __attribute__((packed)) ;
 
diff --git a/include/types/stream.h b/include/types/stream.h
index 5e854c5..02eacd9 100644
--- a/include/types/stream.h
+++ b/include/types/stream.h
@@ -119,6 +119,7 @@ struct strm_logs {
 };
 
 struct stream {
+	enum obj_type obj_type; /* object type == OBJ_TYPE_STREAM */
 	int flags;  /* some flags describing the stream */
 	unsigned int uniq_id;   /* unique ID used for the traces */
 	enum obj_type *target;  /* target to use for this stream */
-- 
2.7.4

From 7f4b2ae2e0a98efd2fa162e906c4bb641732ae98 Mon Sep 17 00:00:00 2001
From: Baptiste Assmann 
Date: Tue, 30 Jan 2018 08:08:04 +0100
Subject: [PATCH 2/4] MINOR: dns: move callback affection in
 dns_link_resolution()

In dns.c, dns_link_resolution(), each type of dns requester is managed
separately, that said, the callback function is affected globaly (and
points to server type callbacks only).
This design prevents the addition of new dns requester type and this
patch aims at fixing this limitation: now, the callback setting is done
directly into the portion of code dedicated to each requester type.
---
 src/dns.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/dns.c b/src/dns.c
index f39f3ff..8ac5024 100644
--- a/src/dns.c
+++ b/src/dns.c
@@ -1397,6 +1397,9 @@ int dns_link_resolution(void *requester, int requester_type, int requester_locke
 			req = srv->dns_requester;
 		if (!requester_locked)
 			HA_SPIN_UNLOCK(SERVER_LOCK, &srv->lock);
+
+		req->requester_cb   = snr_resolution_cb;
+		req->requester_error_cb = snr_resolution_error_cb;
 	}
 	else if (srvrq) {
 		if (srvrq->dns_requester == NULL) {
@@ -1407,13 +1410,14 @@ int dns_link_resolution(void *requester, int requester_type, int requester_locke
 		}
 		else
 			req = srvrq->dns_requester;
+
+		req->requester_cb   = snr_resolution_cb;
+		req->requester_error_cb = snr_resolution_error_cb;
 	}
 	else
 		goto err;
 
 	req->resolution = res;
-	req->requester_cb   = snr_resolution_cb;
-	req->requester_error_cb = snr_resolution_error_cb;
 
 	LIST_ADDQ(&res->requesters, &req->list);
 	return 0;
-- 
2.7.4

From 077ea8af588e0f0ac2ac4070d514e27c6dac57c9 Mon Sep 17 00:00:00 2001
From: Baptiste Assmann 
Date: Mon, 21 Jan 2019 08:34:50 +0100
Subject: [PATCH 4/4] MINOR: action: new 'http-request do-resolve' action

The 'do-resolve' action is an http-request action which allows to run
DNS resolution at run time in HAProxy.
The name to be resolved can be picked up in the request sent by the
client and the result of the resolution is stored in a variable.
The time the resolution is being performed, the request is on pause.
If the resolution can't provide a suitable result, then the variable
will be empty. It's up to the admin to take decisions based on this
statement (return 503 to prevent loops).

Read carefully the documentation concerning this feature, to ensure your
setup is secure and safe to be used in production.
---
 do

Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.

2019-01-23 Thread Janusz Dziemidowicz
śr., 23 sty 2019 o 10:41 Lukas Tribus  napisał(a):
> > I tested all my servers and I've noticed that nginx is broken too. I
> > am running nginx 1.14.2 with OpenSSL 1.1.1a The nginx source contains
> > exactly the same function as haproxy:
> > https://trac.nginx.org/nginx/browser/nginx/src/event/ngx_event_openssl.c?rev=ebf8c9686b8ce7428f975d8a567935ea3722da70#L850
> >
> > However, it seems that it might have been fixed in 1.15.2 by this commit:
> > https://trac.nginx.org/nginx/changeset/e3ba4026c02d2c1810fd6f2cecf499fc39dde5ee/nginx/src/event/ngx_event_openssl.c
>
> Thanks for this. It's actually nginx 1.15.4 (September 2018) where
> this commit is present.

Yes, typed too fast ;)

> Are nginx folks aware of the problem? It would probably be wise for
> them to backport the fix to their 1.14 tree ...

1.14.2 is current version in Debian testing. Debian seems reluctant to
use "mainline" nginx versions (1.15.x) so 1.14.x might end in Debian
10. I'll try to file Debian bug report later today.

-- 
Janusz Dziemidowicz



Re: Rate-limit relating to the healthy servers count

2019-01-23 Thread Jarno Huuskonen
Hi,

On Wed, Jan 23, Thomas Hilaire wrote:
> Hi,
> 
> I want to implement a rate-limit system using the sticky table of
> HAProxy. Consider that I have 100 servers, and a limit of 10
> requests per server, the ACL would be:
> 
>     http-request track-sc0 int(1) table GlobalRequestsTracker
>     http-request deny deny_status 429 if {
> sc0_http_req_rate(GlobalRequestsTracker),div(100) gt 10 }
> 
> Now if I want to make this dynamic depending on the healthy servers
> count, I need to replace the hardcoded `100` per the `nbsrv`
> converter like this:
> 
>     http-request track-sc0 int(1) table GlobalRequestsTracker
>     http-request deny deny_status 429 if {
> sc0_http_req_rate(GlobalRequestsTracker),div(nbsrv(MyBackend)) gt 10
> }
> 
> But I'm getting the error:
> 
>     error detected while parsing an 'http-request deny' condition :
> invalid args in converter 'div' : expects an integer or a variable
> name in ACL expression
> 'sc0_http_req_rate(GlobalRequestsTracker),div(nbsrv(MyBackend))'.
> 
> Is there a way to use `nbsrv` as a variable inside the `div` operator?

Untested: does something like this work:

http-request set-var(req.dummy) nbsrv(GlobalRequestsTracker)
http-request deny deny_status 429 if { 
sc0_http_req_rate(GlobalRequestsTracker),div(req.dummy) gt 10 }

-Jarno

-- 
Jarno Huuskonen



Re: H2 Server Connection Resets (1.9.2)

2019-01-23 Thread Luke Seelenbinder
Hi Willy,

> When using "http-reuse always" the issue disappears and I
> can never get any issue at all. Now that I've fixed this, I'm seeing the
> issue with the SD flags.

Now that's interesting. We were using http-reuse always and experiencing this 
issue (as well as getting 80+% connection reuse). When I scaled it back to 
http-reuse safe, the frequency of this issue seemed to be much lower. (Perhaps 
because the bulk of my testing was with one client and somewhat unscientific?)

> Thus it
> definitely is a matter of bad interaction between two streams, or one
> stream affecting the connection and hurting the other stream.

My debugging spidery-sense points to the same thing. Let me know if you want me 
to share our config (it's quite complex) with you privately or if there's 
anything else we can do to assist.

> I now have something to dig into.

:-)

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Wednesday, January 23, 2019 11:39 AM, Willy Tarreau  wrote:

> On Wed, Jan 23, 2019 at 11:09:53AM +0100, Willy Tarreau wrote:
> 

> > On Wed, Jan 23, 2019 at 09:24:19AM +, Luke Seelenbinder wrote:
> > 

> > > > I've place an nginx instance after my local haproxy dev config, and
> > > > found something which might explain what you're observing : the process
> > > > apparently leaks FDs and fails once in a while, causing 500 to be 
> > > > returned :
> > > 

> > > That's fascinating. I would have thought nginx would have had a bit better
> > > care given to things like that. . .
> > 

> > Well, it's possible I'm hitting a corner case. I don't want to blame nginx
> > for such situations, we all have our share of crap when it comes to error
> > handling :-)
> 

> Actually I have to stand corrected, the issue is with our idle connection
> management. For some reason we pile up new connections instead of reusing
> the previous ones and the nginx process fails to stand extra ones past a
> certain point. When using "http-reuse always" the issue disappears and I
> can never get any issue at all. Now that I've fixed this, I'm seeing the
> issue with the SD flags. I don't have this one in the specific case where
> I only have one client at a time, though there's still some reuse. Thus it
> definitely is a matter of bad interaction between two streams, or one
> stream affecting the connection and hurting the other stream.
> 

> I now have something to dig into.
> 

> Thanks,
> Willy



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.

2019-01-23 Thread Willy Tarreau
On Wed, Jan 23, 2019 at 10:40:09AM +0100, Lukas Tribus wrote:
> Also, we need a big fat warning that all TLSv1.3 users must upgrade in
> the next 1.8 and 1.9 stable version announcement containing this fix.

That's a good point, this will also encourage distro maintainers to
update their versions.

> I have filed a tracking bug for this, which can be closed when backported:
> https://github.com/haproxy/haproxy/issues/24
> 
> Closed or not, the tracking bug makes this easier to find.

Thanks!

Willy



Re: H2 Server Connection Resets (1.9.2)

2019-01-23 Thread Willy Tarreau
On Wed, Jan 23, 2019 at 11:09:53AM +0100, Willy Tarreau wrote:
> On Wed, Jan 23, 2019 at 09:24:19AM +, Luke Seelenbinder wrote:
> > > I've place an nginx instance after my local haproxy dev config, and
> > > found something which might explain what you're observing : the process
> > > apparently leaks FDs and fails once in a while, causing 500 to be 
> > > returned :
> > 
> > That's fascinating. I would have thought nginx would have had a bit better
> > care given to things like that. . .
> 
> Well, it's possible I'm hitting a corner case. I don't want to blame nginx
> for such situations, we all have our share of crap when it comes to error
> handling :-)

Actually I have to stand corrected, the issue is with our idle connection
management. For some reason we pile up new connections instead of reusing
the previous ones and the nginx process fails to stand extra ones past a
certain point. When using "http-reuse always" the issue disappears and I
can never get any issue at all. Now that I've fixed this, I'm seeing the
issue with the SD flags. I don't have this one in the specific case where
I only have one client at a time, though there's still some reuse. Thus it
definitely is a matter of bad interaction between two streams, or one
stream affecting the connection and hurting the other stream.

I now have something to dig into.

Thanks,
Willy



Rate-limit relating to the healthy servers count

2019-01-23 Thread Thomas Hilaire

Hi,

I want to implement a rate-limit system using the sticky table of 
HAProxy. Consider that I have 100 servers, and a limit of 10 requests 
per server, the ACL would be:


    http-request track-sc0 int(1) table GlobalRequestsTracker
    http-request deny deny_status 429 if { 
sc0_http_req_rate(GlobalRequestsTracker),div(100) gt 10 }


Now if I want to make this dynamic depending on the healthy servers 
count, I need to replace the hardcoded `100` per the `nbsrv` converter 
like this:


    http-request track-sc0 int(1) table GlobalRequestsTracker
    http-request deny deny_status 429 if { 
sc0_http_req_rate(GlobalRequestsTracker),div(nbsrv(MyBackend)) gt 10 }


But I'm getting the error:

    error detected while parsing an 'http-request deny' condition : 
invalid args in converter 'div' : expects an integer or a variable name 
in ACL expression 
'sc0_http_req_rate(GlobalRequestsTracker),div(nbsrv(MyBackend))'.


Is there a way to use `nbsrv` as a variable inside the `div` operator?

Thanks a lot!




Re: H2 Server Connection Resets (1.9.2)

2019-01-23 Thread Willy Tarreau
On Wed, Jan 23, 2019 at 09:24:19AM +, Luke Seelenbinder wrote:
> > I've place an nginx instance after my local haproxy dev config, and
> > found something which might explain what you're observing : the process
> > apparently leaks FDs and fails once in a while, causing 500 to be returned :
> 
> That's fascinating. I would have thought nginx would have had a bit better
> care given to things like that. . .

Well, it's possible I'm hitting a corner case. I don't want to blame nginx
for such situations, we all have our share of crap when it comes to error
handling :-)

> Oddly enough, I cannot find any log entries that approximate this. However,
> it's possible since we're primarily (99+%) using nginx as a reverse-proxy
> that the fd issues wouldn't appear for us.

OK, I just deployed it with the default config and added "http2" at the end
of the "listen :443 ssl" line.

> My next thought is to try tcpdump to try to determine what's on the wire when
> the CD-- and SD-- pairs appear, but since our stack is SSL e2e, that might
> prove difficult. Any suggestions?

For me it's a pain when there's SSL in the mix. Some here on the list are
used to know how to extract the master key and use it to decipher the
traffic, but I don't know how to do this. As an alternative, nginx supports
H2 in clear, so if there's a path where you can disable SSL (e.g. on hosts
where haproxy and nginx are on the same machine), then you can communicate
in clear by having "proto h2" on the haproxy's server line and "http2" in
the nginx config, both without the "ssl" keyword.

> One more interesting piece of data: if we use htx without h2 on the backends,
> we only see CD-- entries consistently (with a very, very few SD-- entries).
> Thus, it would seem whatever is causing the issue is directly related to h2
> backends. I further think we can safely say it is directly related to h2
> streams breaking (due to client-side request cancellations) resulting in the
> whole connection breaking in HAProxy or nginx (though determining which will
> be the trick).

I'm pretty sure you found an issue in haproxy related to the way these
requests are aborted. It's just that trying to reproduce this, I'm first
hitting other issues in the way related to the limitations above :-)

> There's also a strong possibility we replace nginx with HAProxy entirely for
> our SSL + H2 setup as we overhaul the backends, so this problem will probably
> be resolved by removing the problematic interaction.

One more reason for me to speed up figuring what's happening before it
becomes too hard to reproduce it!

> I'm still working on running h2load against our nginx servers to see if that
> turns anything up.

Great, thanks!

> > And at this point the connection is closed and reopened for new requests.
> > There's never any GOAWAY sent.
> 
> If I'm understanding this correctly, that implies as long as nginx sends
> GOAWAY properly, HAProxy will not attempt to reuse the connection?

I've discovered that it's not the case contrary to what I thought (I have
the patch for this, still just testing it). That's how I ended up finding
all this mess, because my nginx never sends me a GOAWAY and sees failures
before.

> > I managed to work around the problem by limiting the number of total
> > requests per connection. I find this extremely dirty but if it helps...
> > I just need to figure how to best do it, so that we can use it as well
> > for H2 as for H1.
> 
> We're pretty satisfied with our h2 fe <-> be h1.1 setup right now, so we will
> probably stick with that for now, since we don't want to have any more
> operational issues from bleeding-edge bugs. (Not a comment on HAProxy, per
> se, just a business reality. :-) ) I'm more than happy to try out anything
> you turn up on our staging setup!

You're absolutely right on this and don't need to justify your choices.
For us having H2 on the backend is only a matter of completeness. While
it does make sense for those deploying CDNs for example, or those dealing
with APIs, on the local network it doesn't bring any real benefit and
further increases the risk of head-of-line blocking due to the shared
connection. And it indeed increases the risk of facing early bugs in
products. Both haproxy's and nginx's HTTP/1 stacks are proven and rock
solid, so you're clearly taking less risks with this.

Regards,
Willy



Re: H2 Server Connection Resets (1.9.2)

2019-01-23 Thread Aleksandar Lazic
Hi Lukas.

Am 23.01.2019 um 10:24 schrieb Luke Seelenbinder:
> Hi Willy,
> 
> Thanks for continuing to look into this. 
> 
>>
> 
>> I've place an nginx instance after my local haproxy dev config, and
>> found something which might explain what you're observing : the process
>> apparently leaks FDs and fails once in a while, causing 500 to be returned :
> 
> That's fascinating. I would have thought nginx would have had a bit better 
> care given to things like that. . .

This can be fixed with increasing the ulimits ;-).

> Oddly enough, I cannot find any log entries that approximate this. However, 
> it's possible since we're primarily (99+%) using nginx as a reverse-proxy 
> that the fd issues wouldn't appear for us.

What's your ulimit for nginx process?

> My next thought is to try tcpdump to try to determine what's on the wire when 
> the CD-- and SD-- pairs appear, but since our stack is SSL e2e, that might 
> prove difficult. Any suggestions?

If you have enough log space you can try to activate debug log in nginx and 
haproxy.

https://nginx.org/en/docs/debugging_log.html
https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#log => debug

This will have some impacts on the performance as every request creates a lot 
of loglines!

It would be interesting which error you have in the nginx log when the CD/SD 
happen as the 'http2 flood detected' is not in the logs.

Which release of nginx do you use?
http://hg.nginx.org/nginx/tags

Maybe there are some errors in the log which can be found in this directory.
http://hg.nginx.org/nginx/file/release-1.15.8/src/http/v2/

> One more interesting piece of data: if we use htx without h2 on the backends, 
> we only see CD-- entries consistently (with a very, very few SD-- entries). 
> Thus, it would seem whatever is causing the issue is directly related to h2 
> backends. I further think we can safely say it is directly related to h2 
> streams breaking (due to client-side request cancellations) resulting in the 
> whole connection breaking in HAProxy or nginx (though determining which will 
> be the trick).
> 
> There's also a strong possibility we replace nginx with HAProxy entirely for 
> our SSL + H2 setup as we overhaul the backends, so this problem will probably 
> be resolved by removing the problematic interaction.

What was the main reason to use the nginx between the haproxy and backends?
What's the backends?

Regards
Aleks

> I'm still working on running h2load against our nginx servers to see if that 
> turns anything up.
> 
>> And at this point the connection is closed and reopened for new requests.
>> There's never any GOAWAY sent.
> 
> If I'm understanding this correctly, that implies as long as nginx sends 
> GOAWAY properly, HAProxy will not attempt to reuse the connection?
> 
>> I managed to work around the problem by limiting the number of total
>> requests per connection. I find this extremely dirty but if it helps...
>> I just need to figure how to best do it, so that we can use it as well
>> for H2 as for H1.
> 
> We're pretty satisfied with our h2 fe <-> be h1.1 setup right now, so we will 
> probably stick with that for now, since we don't want to have any more 
> operational issues from bleeding-edge bugs. (Not a comment on HAProxy, per 
> se, just a business reality. :-) ) I'm more than happy to try out anything 
> you turn up on our staging setup!
> 
> Best,
> Luke
> 
> 
> —
> Luke Seelenbinder
> Stadia Maps | Founder
> stadiamaps.com
> 
> ‐‐‐ Original Message ‐‐‐
> On Wednesday, January 23, 2019 8:28 AM, Willy Tarreau  wrote:
> 
>> Hi Luke,
>>
> 
>> I've place an nginx instance after my local haproxy dev config, and
>> found something which might explain what you're observing : the process
>> apparently leaks FDs and fails once in a while, causing 500 to be returned :
>>
> 
>> 2019/01/23 08:22:13 [crit] 25508#0: *36705 open() 
>> "/usr/local/nginx/html/index.html" failed (24: Too many open files), client: 
>> 1>
>> 2019/01/23 08:22:13 [crit] 25508#0: accept4() failed (24: Too many open 
>> files)
>>
> 
>> 127.0.0.1 - - [23/Jan/2019:08:22:13 +0100] "GET / HTTP/2.0" 500 579 "-" 
>> "Mozilla/4.0 (compatible; MSIE 7.01; Windows)"
>>
> 
>> The ones are seen by haproxy :
>>
> 
>> 127.0.0.1:47098 [23/Jan/2019:08:22:13.589] decrypt trace/ngx 0/0/0/0/0 500 
>> 701 - -  1/1/0/0/0 0/0 "GET / HTTP/1.1"
>>
> 
>> And at this point the connection is closed and reopened for new requests.
>> There's never any GOAWAY sent.
>>
> 
>> I managed to work around the problem by limiting the number of total
>> requests per connection. I find this extremely dirty but if it helps...
>> I just need to figure how to best do it, so that we can use it as well
>> for H2 as for H1.
>>
> 
>> Best regards,
>> Willy
> 




Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.

2019-01-23 Thread Lukas Tribus
On Wed, 23 Jan 2019 at 09:52, Willy Tarreau  wrote:
>
> On Wed, Jan 23, 2019 at 12:07:04AM -0800, Dirkjan Bussink wrote:
> > Of course, you're right. New version of the patch attached!
>
> Now merged, thank you!

It's obvious, but because the commit message doesn't not explicitly mention it:
This must be backported to 1.8.

Also, we need a big fat warning that all TLSv1.3 users must upgrade in
the next 1.8 and 1.9 stable version announcement containing this fix.


I have filed a tracking bug for this, which can be closed when backported:
https://github.com/haproxy/haproxy/issues/24

Closed or not, the tracking bug makes this easier to find.


> I tested all my servers and I've noticed that nginx is broken too. I
> am running nginx 1.14.2 with OpenSSL 1.1.1a The nginx source contains
> exactly the same function as haproxy:
> https://trac.nginx.org/nginx/browser/nginx/src/event/ngx_event_openssl.c?rev=ebf8c9686b8ce7428f975d8a567935ea3722da70#L850
>
> However, it seems that it might have been fixed in 1.15.2 by this commit:
> https://trac.nginx.org/nginx/changeset/e3ba4026c02d2c1810fd6f2cecf499fc39dde5ee/nginx/src/event/ngx_event_openssl.c

Thanks for this. It's actually nginx 1.15.4 (September 2018) where
this commit is present.

Are nginx folks aware of the problem? It would probably be wise for
them to backport the fix to their 1.14 tree ...


> And just for reference, I've found Chrome bug with this problem (as I
> am interested when this will get enabled to keep all my systems
> updated) https://bugs.chromium.org/p/chromium/issues/detail?id=923685

Thanks, will subscribe to this bug also.


Regards,
Lukas



Re: H2 Server Connection Resets (1.9.2)

2019-01-23 Thread Luke Seelenbinder
Hi Willy,

Thanks for continuing to look into this. 


> 

> I've place an nginx instance after my local haproxy dev config, and
> found something which might explain what you're observing : the process
> apparently leaks FDs and fails once in a while, causing 500 to be returned :

That's fascinating. I would have thought nginx would have had a bit better care 
given to things like that. . .

Oddly enough, I cannot find any log entries that approximate this. However, 
it's possible since we're primarily (99+%) using nginx as a reverse-proxy that 
the fd issues wouldn't appear for us.

My next thought is to try tcpdump to try to determine what's on the wire when 
the CD-- and SD-- pairs appear, but since our stack is SSL e2e, that might 
prove difficult. Any suggestions?

One more interesting piece of data: if we use htx without h2 on the backends, 
we only see CD-- entries consistently (with a very, very few SD-- entries). 
Thus, it would seem whatever is causing the issue is directly related to h2 
backends. I further think we can safely say it is directly related to h2 
streams breaking (due to client-side request cancellations) resulting in the 
whole connection breaking in HAProxy or nginx (though determining which will be 
the trick).

There's also a strong possibility we replace nginx with HAProxy entirely for 
our SSL + H2 setup as we overhaul the backends, so this problem will probably 
be resolved by removing the problematic interaction.

I'm still working on running h2load against our nginx servers to see if that 
turns anything up.

> And at this point the connection is closed and reopened for new requests.
> There's never any GOAWAY sent.

If I'm understanding this correctly, that implies as long as nginx sends GOAWAY 
properly, HAProxy will not attempt to reuse the connection?

> I managed to work around the problem by limiting the number of total
> requests per connection. I find this extremely dirty but if it helps...
> I just need to figure how to best do it, so that we can use it as well
> for H2 as for H1.

We're pretty satisfied with our h2 fe <-> be h1.1 setup right now, so we will 
probably stick with that for now, since we don't want to have any more 
operational issues from bleeding-edge bugs. (Not a comment on HAProxy, per se, 
just a business reality. :-) ) I'm more than happy to try out anything you turn 
up on our staging setup!

Best,
Luke


—
Luke Seelenbinder
Stadia Maps | Founder
stadiamaps.com

‐‐‐ Original Message ‐‐‐
On Wednesday, January 23, 2019 8:28 AM, Willy Tarreau  wrote:

> Hi Luke,
> 

> I've place an nginx instance after my local haproxy dev config, and
> found something which might explain what you're observing : the process
> apparently leaks FDs and fails once in a while, causing 500 to be returned :
> 

> 2019/01/23 08:22:13 [crit] 25508#0: *36705 open() 
> "/usr/local/nginx/html/index.html" failed (24: Too many open files), client: 
> 1>
> 2019/01/23 08:22:13 [crit] 25508#0: accept4() failed (24: Too many open files)
> 

> 127.0.0.1 - - [23/Jan/2019:08:22:13 +0100] "GET / HTTP/2.0" 500 579 "-" 
> "Mozilla/4.0 (compatible; MSIE 7.01; Windows)"
> 

> The ones are seen by haproxy :
> 

> 127.0.0.1:47098 [23/Jan/2019:08:22:13.589] decrypt trace/ngx 0/0/0/0/0 500 
> 701 - -  1/1/0/0/0 0/0 "GET / HTTP/1.1"
> 

> And at this point the connection is closed and reopened for new requests.
> There's never any GOAWAY sent.
> 

> I managed to work around the problem by limiting the number of total
> requests per connection. I find this extremely dirty but if it helps...
> I just need to figure how to best do it, so that we can use it as well
> for H2 as for H1.
> 

> Best regards,
> Willy



publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature


Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.

2019-01-23 Thread Willy Tarreau
On Wed, Jan 23, 2019 at 12:07:04AM -0800, Dirkjan Bussink wrote:
> Of course, you're right. New version of the patch attached!

Now merged, thank you!
Willy



Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.

2019-01-23 Thread Dirkjan Bussink
Hi Willy,

On 22 Jan 2019, at 23:17, Willy Tarreau  wrote:
> 
> As you can see it will enable this code when SSL_OP_NO_RENEGOTIATION=0,
> which is what BoringSSL does and it needs this code to be disabled. Thus
> I think it's better to simply do this :
> 
> +#ifndef SSL_OP_NO_RENEGOTIATION
> + /* Please note that BoringSSL defines this macro to zero so don't
> +  * change this to #if and do not assign a default value to this macro!
> +  */
> 

Of course, you’re right. New version of the patch attached!

Cheers,

Dirkjan




0001-BUG-MEDIUM-ssl-Fix-handling-of-TLS-1.3-KeyUpdate-mes.patch
Description: Binary data