Re: reg-tests situation in haproxy 1.8
On Mon, Jan 21, 2019 at 03:28:35PM +0100, Frederic Lecaille wrote: > On 1/19/19 8:53 AM, Willy Tarreau wrote: > > I was interested in backporting them to 1.8 once we have more experience > > with them and they're better organized, so that we avoid backporting > > reorg patches. I'd say we've made quite some progress now and we could > > possibly backport them. But I wouldn't be surprised if we'd soon rename > > many of them again since the relation between the level and the prefix > > letter has to be looked up into the makefile each time, so probably this > > is something we should improve. > > Note that a "reg-tests-help" makefile target dump the list of LEVELs: Oh I'm well aware of this and it's where I look this up every time, it's just that I can't remember them as there's no mnemotechnic mapping, so I systematically have to look this up. > We could set the level with strings: > > h*.vtc -> haproxy > s*.vtc -> slow > l*.vtc -> low (perhaps this one should be removed). > b*.vtx -> bug > k*.vtc -> broken > e*.vtc -> exp > > only list of levels could be permitted: > > $ LEVEL=haproxy,bug make reg-tests ... > > As there is no more level notion here, perhaps we should rename LEVEL > environment variable to VTC_TYPES, REGTEST_TYPES or something else. That could be better. Thinking about it further, since run-regtests already parses the comments at the head of the files to find the various prerequisites, maybe instead we should get rid of the difference in the file name and mention the category with a full word as you did above directly in the files. It would probably be more obvious when editing these files. The "h*" files could become the default ones ("normal" ? "default" ?) when no category is set, and the other categories would have to be explicitly mentioned. One benefit is that we could keep the same naming during initial submission and final merging when it's related to a bug report or a broken script. What do you think ? Willy
Re: haproxy 1.9.2 with boringssl
On Wed, Jan 23, 2019 at 09:37:46PM +0100, Aleksandar Lazic wrote: > > Am 23.01.2019 um 21:27 schrieb Willy Tarreau: > > On Wed, Jan 23, 2019 at 09:08:00PM +0100, Aleksandar Lazic wrote: > >> Should it be possible to have fe with h1 and be server h2(alpn h2), as I > >> expect this or similar return value when I go thru haproxy? > > > > Yes absolutely. That's even what I'm doing on my tests to try to fix > > the issues reported by Luke. > > Okay, perfect. > > Would you like to share your config so that I can see what's wrong with my > config, thanks. Sure, here's a copy-paste, hoping I don't mess with anything :-) defaults mode http option http-use-htx option httplog log stdout format raw daemon timeout connect 4s timeout client 10s timeout server 10s frontend decrypt bind :4445 bind :4446 proto h2 bind :4443 ssl crt rsa+dh2048.pem npn h2 alpn h2 default_backend trace backend trace stats uri /stat server s1 127.0.0.1:443 ssl alpn h2 verify none #server s2 127.0.0.1:80 #server s3 127.0.0.1:80 proto h2 As you can see you just connect to port 4445. > >> I haven't seen any log option to get the backend request method, I think > >> this > >> should be a feature request ;-). > > > > What do you mean with "backend request method" precisely ? > > As the log is for frontends It would be nice to be able to get this infos > from below also for the backend to see what was send to the backend server. But what is sent to the backend is what comes from the frontend. And there never is any valid reason for rewriting the method. So the method sent to the backend is *always* what you receive on the fronend. Cheers, Willy
Re: haproxy 1.9.2 with boringssl
Am 23.01.2019 um 21:27 schrieb Willy Tarreau: > On Wed, Jan 23, 2019 at 09:08:00PM +0100, Aleksandar Lazic wrote: >> Should it be possible to have fe with h1 and be server h2(alpn h2), as I >> expect this or similar return value when I go thru haproxy? > > Yes absolutely. That's even what I'm doing on my tests to try to fix > the issues reported by Luke. Okay, perfect. Would you like to share your config so that I can see what's wrong with my config, thanks. >> I haven't seen any log option to get the backend request method, I think this >> should be a feature request ;-). > > What do you mean with "backend request method" precisely ? As the log is for frontends It would be nice to be able to get this infos from below also for the backend to see what was send to the backend server. The problem what I see is that a tcpdump/tshark does not help to see what's transfered on the wire when the backend talks via TLS. https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#8.2.4 ### current variables | H | %HM | HTTP method (ex: POST)| string | | H | %HP | HTTP request URI without query string (path) | string | | H | %HQ | HTTP request URI query string (ex: ?bar=baz) | string | | H | %HU | HTTP request URI (ex: /foo?bar=baz) | string | | H | %HV | HTTP version (ex: HTTP/1.0) | string | Possible new | H | %bM | Backend HTTP method (ex: POST)| string | | H | %bP | Backend HTTP request URI without query string (path) | string | | H | %bQ | Backend HTTP request URI query string (ex: ?bar=baz) | string | | H | %bU | Backend HTTP request URI (ex: /foo?bar=baz) | string | | H | %bV | Backend HTTP version (ex: HTTP/1.0) | string | ### > Willy Aleks
Re: DDoS protection: ban clients with high HTTP error rates
On 1/23/2019 8:16 AM, Marco Colli wrote: 1. Based on advanced conditions (e.g. current user) our Rails application decides whether to return a normal response (e.g. 2xx) or a 429 (Too Many Requests); it can also return other errors, like 401 2. HAProxy bans clients if they produce too many 4xx errors What do you think about this solution? Also, is it correct to use HAProxy directly or it is more performant to use fail2ban on HAProxy logs? I'm definitely not an expert. My opinion is that you should do both. I haven't set up the protections in haproxy itself, but I know it can be done. That's something I plan to look into when I find some time. Just a couple of days ago, I set up a fail2ban jail that looks at the haproxy log and initiates bans based on what it finds. It works REALLY well. This is the definition that activates the jail, in a config file I've placed in /etc/fail2ban/jail.d: [haproxy-custom] enabled = true findtime = 120 bantime = 3600 logpath = /var/log/debug-haproxy maxretry = 20 This is the definition of the filter it uses, in /etc/fail2ban/filter.d/haproxy-custom.conf: [Definition] _daemon = haproxy failregex = ^%(__prefix_line)s(?::\d+)?\s+.* Basically, if there are 20 or more requests in the log over a timespan of two minutes from one source IP, that address gets banned. Most of the NOSRV requests in my log are http->https redirects. All of the attacks that I have seen since setting this server up have come in as http, and I have haproxy configured to redirect ALL insecure requests to https. I do have a few settings in haproxy that result in some connections being denied entirely, which also produces a log. It's entirely possible that a web application could be badly written such that it triggers this jail accidentally, but I would expect most applications to be just fine. Legitimate traffic can produce the http->https redirects, but it's certainly not likely to get 20 of them in two minutes. I may also implement a similar filter for repeated 404 errors and maybe other errors like 400 or 500, to cover attacks on the https frontends where the webserver says the path doesn't exist. The fail2ban package comes with a filter for 401 responses in the haproxy logs; I based my regex on that one. Thanks, Shawn
Re: haproxy 1.9.2 with boringssl
On Wed, Jan 23, 2019 at 09:08:00PM +0100, Aleksandar Lazic wrote: > Should it be possible to have fe with h1 and be server h2(alpn h2), as I > expect this or similar return value when I go thru haproxy? Yes absolutely. That's even what I'm doing on my tests to try to fix the issues reported by Luke. > I haven't seen any log option to get the backend request method, I think this > should be a feature request ;-). What do you mean with "backend request method" precisely ? Willy
Re: haproxy 1.9.2 with boringssl
Hi Willy. Am 23.01.2019 um 19:50 schrieb Willy Tarreau: > Hi Aleks, > > On Wed, Jan 23, 2019 at 06:58:25PM +0100, Aleksandar Lazic wrote: >> backend be_generic_tcp >> mode http >> balance source >> timeout check 5s >> option tcp-check >> >> server "${SERVICE_NAME}" ${SERVICE_DEST_IP}:${SERVICE_DEST_PORT} check >> inter 5s proto h2 ssl ssl-min-ver TLSv1.3 verify none > > You need to replace "proto h2" with "alpn h2", so that the application > protocol is announced to the other host, otherwise it will stick to the > default, very likely "http/1.1", while haproxy talks h2 there. This can > explain the 502 when the other side rejected your request. I have changed it but still no lock. Should it be possible to have fe with h1 and be server h2(alpn h2), as I expect this or similar return value when I go thru haproxy? I haven't seen any log option to get the backend request method, I think this should be a feature request ;-). curl -vo /dev/null https://mail.google.com:443 * Trying 172.217.21.229... * Connected to mail.google.com (172.217.21.229) port 443 (#0) * Initializing NSS with certpath: sql:/etc/pki/nssdb * CAfile: /etc/pki/tls/certs/ca-bundle.crt CApath: none * SSL connection using TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 * Server certificate: * subject: CN=mail.google.com,O=Google LLC,L=Mountain View,ST=California,C=US * start date: Dec 19 08:16:00 2018 GMT * expire date: Mar 13 08:16:00 2019 GMT * common name: mail.google.com * issuer: CN=Google Internet Authority G3,O=Google Trust Services,C=US > GET / HTTP/1.1 > User-Agent: curl/7.29.0 > Host: mail.google.com > Accept: */* > < HTTP/1.1 301 Moved Permanently < Location: /mail/ < Expires: Wed, 23 Jan 2019 20:01:34 GMT < Date: Wed, 23 Jan 2019 20:01:34 GMT < Cache-Control: private, max-age=7776000 < Content-Type: text/html; charset=UTF-8 < X-Content-Type-Options: nosniff < X-Frame-Options: SAMEORIGIN < X-XSS-Protection: 1; mode=block < Server: GSE < Alt-Svc: clear < Accept-Ranges: none < Vary: Accept-Encoding < Transfer-Encoding: chunked < { [data not shown] * Connection #0 to host mail.google.com left intact Config is now this. ### cat /tmp/haproxy.cfg # https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#3 global # nodaemon log stdout format rfc5424 daemon "${LOGLEVEL}" stats socket /tmp/sock1 mode 666 level admin stats timeout 1h tune.ssl.default-dh-param 2048 ssl-server-verify none nbthread "${NUM_THREADS}" defaults log global # the format is described at # https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#4 # copied from # https://github.com/haproxytech/haproxy-docker-arm64v8/blob/master/cfg_files/haproxy.cfg retries 3 timeout http-request10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout http-keep-alive 10s timeout check 10s maxconn 3000 default-server resolve-prefer ipv4 inter 5s resolvers mydns option http-use-htx option httplog log-format ">>> %ci:%cp [%tr] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r %rt %sslv %sslc" resolvers mydns nameserver dns1 "${DNS_SRV001}":53 nameserver dns2 "${DNS_SRV002}":53 resolve_retries 3 timeout retry 1s hold valid 10s listen stats bind :"${STATS_PORT}" mode http # Health check monitoring uri. monitor-uri /healthz # Add your custom health check monitoring failure condition here. # monitor fail if stats enable stats hide-version stats realm Haproxy\ Statistics stats uri / stats auth "${STATS_USER}":"${STATS_PASSWORD}" frontend public_tcp bind :"${SERVICE_TCP_PORT}" alpn h2,http/1.1 mode http log global default_backend be_generic_tcp backend be_generic_tcp mode http balance source timeout check 5s option tcp-check server "${SERVICE_NAME}" ${SERVICE_DEST_IP}:${SERVICE_DEST_PORT} check inter 5s alpn h2 ssl ssl-min-ver TLSv1.3 verify none ### Log of haproxy <29>1 2019-01-23T20:00:30+00:00 doh-001 haproxy 1 - - Proxy stats started. <29>1 2019-01-23T20:00:30+00:00 doh-001 haproxy 1 - - Proxy public_tcp started. <29>1 2019-01-23T20:00:30+00:00 doh-001 haproxy 1 - - Proxy be_generic_tcp started. [WARNING] 022/200030 (1) : be_generic_tcp/google-mail changed its IP from 172.217.21.229 to 172.217.18.165 by mydns/dns1. <29>1 2019-01-23T20:00:30+00:00 doh-001 haproxy 1 - - be_generic_tcp/google-mail changed its IP from 172.217.21.229 to 172.217.18.165 by mydns/dns1. :public_tcp.accept(0006)=000c from [127.0.0.1:54308] ALPN= :public_tcp.clireq[000c:]: GET / HTTP/1.1 :public_tcp.clihdr[000c:]: user-agent: curl/7.29.0 :public_tcp.clihdr[000c:]: host: 127.0.0.1:8443 :public_tcp.clihdr[000c:]: accept: */* :be_generic_tcp.srvcls[000c:0021] :be_g
Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.
śr., 23 sty 2019 o 11:53 Janusz Dziemidowicz napisał(a): > 1.14.2 is current version in Debian testing. Debian seems reluctant to > use "mainline" nginx versions (1.15.x) so 1.14.x might end in Debian > 10. I'll try to file Debian bug report later today. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=920297 -- Janusz Dziemidowicz
Re: haproxy 1.9.2 with boringssl
Hi Aleks, On Wed, Jan 23, 2019 at 06:58:25PM +0100, Aleksandar Lazic wrote: > backend be_generic_tcp > mode http > balance source > timeout check 5s > option tcp-check > > server "${SERVICE_NAME}" ${SERVICE_DEST_IP}:${SERVICE_DEST_PORT} check > inter 5s proto h2 ssl ssl-min-ver TLSv1.3 verify none You need to replace "proto h2" with "alpn h2", so that the application protocol is announced to the other host, otherwise it will stick to the default, very likely "http/1.1", while haproxy talks h2 there. This can explain the 502 when the other side rejected your request. Willy
Re: [PATCH] runtime do-resolve http action
Hi Baptiste, On Wed, Jan 23, 2019 at 02:00:58PM +0100, Baptiste wrote: > Hi Willy, > > Please find attached to this email a set of 4 patches which add a new HTTP > action that can use a dns resolver section to perform a DNS resolution > based on the output of a fetch. > The use case is split DNS situations or with highly dynamic environment > where servers behind HAProxy are just ephemeral services. Ah thanks for having rebased them. I have some comments below, some purely cosmetic, some less : > diff --git a/include/types/stream.h b/include/types/stream.h > index 5e854c5..02eacd9 100644 > --- a/include/types/stream.h > +++ b/include/types/stream.h > @@ -119,6 +119,7 @@ struct strm_logs { > }; > > struct stream { > + enum obj_type obj_type; /* object type == OBJ_TYPE_STREAM */ Here this drills a 7-bytes hole between obj_type and flags. It would be better to move this field elsewhere in the struct where there's a hole already. > int flags; /* some flags describing the stream */ > unsigned int uniq_id; /* unique ID used for the traces */ > enum obj_type *target; /* target to use for this stream */ > -- > From 077ea8af588e0f0ac2ac4070d514e27c6dac57c9 Mon Sep 17 00:00:00 2001 > From: Baptiste Assmann > Date: Mon, 21 Jan 2019 08:34:50 +0100 > Subject: [PATCH 4/4] MINOR: action: new 'http-request do-resolve' action > > The 'do-resolve' action is an http-request action which allows to run > DNS resolution at run time in HAProxy. > The name to be resolved can be picked up in the request sent by the > client and the result of the resolution is stored in a variable. > The time the resolution is being performed, the request is on pause. > If the resolution can't provide a suitable result, then the variable > will be empty. It's up to the admin to take decisions based on this > statement (return 503 to prevent loops). > > Read carefully the documentation concerning this feature, to ensure your > setup is secure and safe to be used in production. > --- > doc/configuration.txt | 54 +- > include/proto/action.h | 3 + > include/proto/dns.h| 2 + > include/types/action.h | 8 ++ > include/types/stream.h | 10 ++ > src/action.c | 34 +++ > src/cfgparse.c | 18 > src/dns.c | 266 > + > src/proto_http.c | 9 +- > src/stream.c | 11 ++ > 10 files changed, 407 insertions(+), 8 deletions(-) > > diff --git a/doc/configuration.txt b/doc/configuration.txt > index 2a7efe9..0155274 100644 > --- a/doc/configuration.txt > +++ b/doc/configuration.txt > @@ -4064,7 +4064,6 @@ http-check send-state > >See also : "option httpchk", "http-check disable-on-404" > > - > http-request [options...] [ { if | unless } ] >Access control for Layer 7 requests > > @@ -4219,6 +4218,59 @@ http-request deny [deny_status ] [ { if | > unless } ] >those that can be overridden by the "errorfile" directive. >No further "http-request" rules are evaluated. > > +http-request do-resolve(,,[ipv4,ipv6]) : > + This action performs a DNS resolution of the output of and stores > + the result in the variable . It uses the DNS resolvers section > + pointed by . > + It is possible to choose a resolution preference using the optional > + arguments 'ipv4' or 'ipv6'. > + When performing the DNS resolution, the client side connection is on > + pause waiting till the end of the resolution. > + If an IP address can be found, it is stored into . If any kind of > + error occurs, then is not set. Just to be sure, it is not set or not modified ? I guess the latter, which is fine. > diff --git a/include/types/stream.h b/include/types/stream.h > index 02eacd9..26a5e4a 100644 > --- a/include/types/stream.h > +++ b/include/types/stream.h > @@ -179,6 +179,16 @@ struct stream { > struct list *current_rule_list; /* this is used to store the > current executed rule list. */ > void *current_rule; /* this is used to store the > current rule to be resumed. */ > struct hlua *hlua; /* lua runtime context */ > + > + /* Context */ > + union { > + struct { > + struct dns_requester *dns_requester; > + char *hostname_dn; > + int hostname_dn_len; > + struct act_rule *parent; > + } dns; > + } ctx; History has told us that every single time we created a union with a single field inside hoping to reuse it later, we never reused it. Thus better directly put the structure and call it "dns_ctx". It will also be clearer because "ctx" or "context" are unclear here given that a stream *is* a context already, so you have a generic context in a context. Also, given that you have a 4-bytes hole after hostname_dn_len, maybe it could make sense to place your obj_type there
Re: haproxy 1.9.2 with boringssl
Hi. After some tricky stuff with centos I switched to debian as base image and was now able to build haproxy with boringssl. /usr/local/sbin/haproxy -vv HA-Proxy version 1.9.2 2019/01/16 - https://haproxy.org/ Build options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference OPTIONS = USE_LINUX_SPLICE=1 USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_THREAD=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_TFO=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with OpenSSL version : BoringSSL OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3 Built with Lua version : Lua 5.3.5 Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Built with zlib version : 1.2.8 Running on zlib version : 1.2.8 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with PCRE2 version : 10.22 2016-07-29 PCRE2 library supports JIT : yes Encrypted password support via crypt(3): yes Built with multi-threading support. Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available multiplexer protocols : (protocols marked as cannot be specified using 'proto' keyword) h2 : mode=HTXside=FE|BE h2 : mode=HTTP side=FE : mode=HTXside=FE|BE : mode=TCP|HTTP side=FE|BE Available filters : [SPOE] spoe [COMP] compression [CACHE] cache [TRACE] trace Now I want to try to make the request to mail.google.com with this config and runtime. ### cat /tmp/haproxy.cfg # https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#3 global # nodaemon log stdout format rfc5424 daemon "${LOGLEVEL}" stats socket /tmp/sock1 mode 666 level admin stats timeout 1h tune.ssl.default-dh-param 2048 ssl-server-verify none nbthread "${NUM_THREADS}" defaults log global # the format is described at # https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#4 # copied from # https://github.com/haproxytech/haproxy-docker-arm64v8/blob/master/cfg_files/haproxy.cfg retries 3 timeout http-request10s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout http-keep-alive 10s timeout check 10s maxconn 3000 default-server resolve-prefer ipv4 inter 5s resolvers mydns option http-use-htx resolvers mydns nameserver dns1 "${DNS_SRV001}":53 nameserver dns2 "${DNS_SRV002}":53 resolve_retries 3 timeout retry 1s hold valid 10s listen stats bind :"${STATS_PORT}" mode http # Health check monitoring uri. monitor-uri /healthz # Add your custom health check monitoring failure condition here. # monitor fail if stats enable stats hide-version stats realm Haproxy\ Statistics stats uri / stats auth "${STATS_USER}":"${STATS_PASSWORD}" frontend public_tcp bind :"${SERVICE_TCP_PORT}" mode http option httplog log global default_backend be_generic_tcp backend be_generic_tcp mode http balance source timeout check 5s option tcp-check server "${SERVICE_NAME}" ${SERVICE_DEST_IP}:${SERVICE_DEST_PORT} check inter 5s proto h2 ssl ssl-min-ver TLSv1.3 verify none ### Test with curl ### curl -v http://127.0.0.1:8443 * About to connect() to 127.0.0.1 port 8443 (#0) * Trying 127.0.0.1... * Connected to 127.0.0.1 (127.0.0.1) port 8443 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.29.0 > Host: 127.0.0.1:8443 > Accept: */* > * HTTP 1.0, assume close after body < HTTP/1.0 502 Bad Gateway < cache-control: no-cache < content-type: text/html < 502 Bad Gateway The server returned an invalid or incomplete response. * Closing connection 0 ### podmain.io instead of docker podman run --rm -it -e LOGLEVEL=debug -e NUM_THREADS=8 -e DNS_SRV001=1.1.1.1 -e DNS_SRV002=8.8.8.8 \ -e STATS_PORT=7411 -e STATS_USER=test -e STATS_PASSWORD=test -e SERVICE_TCP_PORT=8443 \ -e SERVICE_NAME=google-mail -e SERVICE_DEST_IP=mail.google.com -e SERVICE_DEST_PORT=443 \ -e CONFIG_FILE=/mnt/haproxy.cfg -v /tmp/:/mnt/ -p 8443 --expose 8443 --net host \ me2digital/haproxy-19-boringssl using CONFIG_FILE :/mnt/haproxy.cfg <29>1 2019-01-23T17:50:45+00:00 doh-001 haproxy 1 - - Proxy stats started. <29>1 2019-01-23T17:50:4
Re: H2 Server Connection Resets (1.9.2)
Hi Willy, This is all very good to hear. I'm glad you were able to get to the bottom of it all! Feel free to send along patches if you want me to test before the 1.9.3 release. I'm more than happy to do so. Best, Luke — Luke Seelenbinder Stadia Maps | Founder stadiamaps.com ‐‐‐ Original Message ‐‐‐ On Wednesday, January 23, 2019 6:02 PM, Willy Tarreau wrote: > Hi Luke, > > On Wed, Jan 23, 2019 at 10:47:33AM +, Luke Seelenbinder wrote: > > > We were using http-reuse always and experiencing this > > issue (as well as getting 80+% connection reuse). When I scaled it back to > > http-reuse safe, the frequency of this issue seemed to be much lower. > > (Perhaps because the bulk of my testing was with one client and somewhat > > unscientific?) > > It could be caused by various things. In my tests the client doesn't even > use keep-alive so haproxy is less aggressive with connection reuse and > that could explain some differences. > > > > Thus it > > > definitely is a matter of bad interaction between two streams, or one > > > stream affecting the connection and hurting the other stream. > > > > My debugging spidery-sense points to the same thing. > > So I have more info now. There are multiple issues which stack up and > cause this : > > - the GOAWAY frame indicating the last stream id might be in flight > while many more streams have been added. This results in batch > deaths once the limit is met ; > > - the last stream ID received in the GOAWAY frame was not considered > when calculating the number of available streams, leading to more > than acceptable by the server to be created ; > > - there is an issue with how new streams are attached to idle connections > making them non-retryable in case of a failure sur as above. I managed > to fix this but it still requires some testing with other configs ; > > - another issue affects idle connections, some of them could remain > in the idle list while they don't have room anymore because they > are removed only when they deliver the last stream, thus the check > doesn't support jumps in the number of available streams ; I suspect > it could be related to the client aborts that cause server aborts, > just because it allowed some excess streams to be sent to a mux which > doesn't have room anymore, but I could be wrong ; > > And a less important one : the maximum number of concurrent streams per > connection is global. In this case it's 100 so it's lower than nginx's > 128 thus it doesn't cause any issue. But we could run into problems with > this and I must address this to make it per-connection. > > With all these changes, I managed to run a long test with no more errors > and only an immediate retry once in a while if nginx announced the GOAWAY > too late. When we set the limit ourselves, there's not even any retry > anymore. Thus I'll continue to work on this and we'll slightly delay 1.9.3 > to collect these fixes. From there we'll be able to see if you still have > problems and iterate. > > > > Let me know if you want me to share our config (it's quite complex) with you > > privately or if there's anything else we can do to assist. > > That's kind but now I don't need it anymore, I have everything needed to > reproduce the whole issue it seems. > > Thanks, > Willy publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
Re: H2 Server Connection Resets (1.9.2)
Hi Luke, On Wed, Jan 23, 2019 at 10:47:33AM +, Luke Seelenbinder wrote: > We were using http-reuse always and experiencing this > issue (as well as getting 80+% connection reuse). When I scaled it back to > http-reuse safe, the frequency of this issue seemed to be much lower. > (Perhaps because the bulk of my testing was with one client and somewhat > unscientific?) It could be caused by various things. In my tests the client doesn't even use keep-alive so haproxy is less aggressive with connection reuse and that could explain some differences. > > Thus it > > definitely is a matter of bad interaction between two streams, or one > > stream affecting the connection and hurting the other stream. > > My debugging spidery-sense points to the same thing. So I have more info now. There are multiple issues which stack up and cause this : - the GOAWAY frame indicating the last stream id might be in flight while many more streams have been added. This results in batch deaths once the limit is met ; - the last stream ID received in the GOAWAY frame was not considered when calculating the number of available streams, leading to more than acceptable by the server to be created ; - there is an issue with how new streams are attached to idle connections making them non-retryable in case of a failure sur as above. I managed to fix this but it still requires some testing with other configs ; - another issue affects idle connections, some of them could remain in the idle list while they don't have room anymore because they are removed only when they deliver the last stream, thus the check doesn't support jumps in the number of available streams ; I suspect it could be related to the client aborts that cause server aborts, just because it allowed some excess streams to be sent to a mux which doesn't have room anymore, but I could be wrong ; And a less important one : the maximum number of concurrent streams per connection is global. In this case it's 100 so it's lower than nginx's 128 thus it doesn't cause any issue. But we could run into problems with this and I must address this to make it per-connection. With all these changes, I managed to run a long test with no more errors and only an immediate retry once in a while if nginx announced the GOAWAY too late. When we set the limit ourselves, there's not even any retry anymore. Thus I'll continue to work on this and we'll slightly delay 1.9.3 to collect these fixes. From there we'll be able to see if you still have problems and iterate. > Let me know if you want me to share our config (it's quite complex) with you > privately or if there's anything else we can do to assist. That's kind but now I don't need it anymore, I have everything needed to reproduce the whole issue it seems. Thanks, Willy
Re: Rate-limit relating to the healthy servers count
Hi, I didn't think about such trick, it works! Thank a lot! On 23/01/2019 11:48, Jarno Huuskonen wrote: Hi, On Wed, Jan 23, Thomas Hilaire wrote: Hi, I want to implement a rate-limit system using the sticky table of HAProxy. Consider that I have 100 servers, and a limit of 10 requests per server, the ACL would be: http-request track-sc0 int(1) table GlobalRequestsTracker http-request deny deny_status 429 if { sc0_http_req_rate(GlobalRequestsTracker),div(100) gt 10 } Now if I want to make this dynamic depending on the healthy servers count, I need to replace the hardcoded `100` per the `nbsrv` converter like this: http-request track-sc0 int(1) table GlobalRequestsTracker http-request deny deny_status 429 if { sc0_http_req_rate(GlobalRequestsTracker),div(nbsrv(MyBackend)) gt 10 } But I'm getting the error: error detected while parsing an 'http-request deny' condition : invalid args in converter 'div' : expects an integer or a variable name in ACL expression 'sc0_http_req_rate(GlobalRequestsTracker),div(nbsrv(MyBackend))'. Is there a way to use `nbsrv` as a variable inside the `div` operator? Untested: does something like this work: http-request set-var(req.dummy) nbsrv(GlobalRequestsTracker) http-request deny deny_status 429 if { sc0_http_req_rate(GlobalRequestsTracker),div(req.dummy) gt 10 } -Jarno
DDoS protection: ban clients with high HTTP error rates
Hello! I use HAProxy in front of a web app / service and I would like to add DDoS protection and rate limiting. The problem is that each part of the application has different request rates and for some customers we must accept very hight request rates and burst, while this is not allowed for unauthenticated users for example. So I was thinking about this solution: 1. Based on advanced conditions (e.g. current user) our Rails application decides whether to return a normal response (e.g. 2xx) or a 429 (Too Many Requests); it can also return other errors, like 401 2. HAProxy bans clients if they produce too many 4xx errors What do you think about this solution? Also, is it correct to use HAProxy directly or it is more performant to use fail2ban on HAProxy logs? This is the HAProxy configuration that I would like to use: frontend www-frontend tcp-request connection reject if { src_http_err_rate(st_abuse) ge 5 } http-request track-sc0 src table st_abuse ... default_backend www-backend backend www-backend ... backend st_abuse stick-table type ipv6 size 1m expire 10s store http_err_rate(10s) Do you think that the above rules are correct? Am I missing something? Also, is it correct to mix *tcp*-request and src_*http*_err_rate in the frontend? Is it possible to include only the 4xx errors (and not 5xx) in http_err_rate? Any suggestion would be greatly appreciated Thank you Marco Colli
[PATCH] runtime do-resolve http action
Hi Willy, Please find attached to this email a set of 4 patches which add a new HTTP action that can use a dns resolver section to perform a DNS resolution based on the output of a fetch. The use case is split DNS situations or with highly dynamic environment where servers behind HAProxy are just ephemeral services. Baptiste From c3baea8c50a7dcbe4557c4a578fcbd252ffb7c56 Mon Sep 17 00:00:00 2001 From: Baptiste Assmann Date: Tue, 30 Jan 2018 08:10:20 +0100 Subject: [PATCH 3/4] MINOR: obj_type: new object type for struct stream This patch creates a new obj_type for the struct stream in HAProxy. --- include/proto/obj_type.h | 13 + include/types/obj_type.h | 1 + include/types/stream.h | 1 + 3 files changed, 15 insertions(+) diff --git a/include/proto/obj_type.h b/include/proto/obj_type.h index 47273ca..19865bb 100644 --- a/include/proto/obj_type.h +++ b/include/proto/obj_type.h @@ -30,6 +30,7 @@ #include #include #include +#include #include static inline enum obj_type obj_type(enum obj_type *t) @@ -158,6 +159,18 @@ static inline struct dns_srvrq *objt_dns_srvrq(enum obj_type *t) return __objt_dns_srvrq(t); } +static inline struct stream *__objt_stream(enum obj_type *t) +{ + return container_of(t, struct stream, obj_type); +} + +static inline struct stream *objt_stream(enum obj_type *t) +{ + if (!t || *t != OBJ_TYPE_STREAM) + return NULL; + return __objt_stream(t); +} + static inline void *obj_base_ptr(enum obj_type *t) { switch (obj_type(t)) { diff --git a/include/types/obj_type.h b/include/types/obj_type.h index e141d69..9410718 100644 --- a/include/types/obj_type.h +++ b/include/types/obj_type.h @@ -41,6 +41,7 @@ enum obj_type { OBJ_TYPE_CONN, /* object is a struct connection */ OBJ_TYPE_SRVRQ,/* object is a struct dns_srvrq */ OBJ_TYPE_CS, /* object is a struct conn_stream */ + OBJ_TYPE_STREAM, /* object is a struct stream */ OBJ_TYPE_ENTRIES /* last one : number of entries */ } __attribute__((packed)) ; diff --git a/include/types/stream.h b/include/types/stream.h index 5e854c5..02eacd9 100644 --- a/include/types/stream.h +++ b/include/types/stream.h @@ -119,6 +119,7 @@ struct strm_logs { }; struct stream { + enum obj_type obj_type; /* object type == OBJ_TYPE_STREAM */ int flags; /* some flags describing the stream */ unsigned int uniq_id; /* unique ID used for the traces */ enum obj_type *target; /* target to use for this stream */ -- 2.7.4 From 7f4b2ae2e0a98efd2fa162e906c4bb641732ae98 Mon Sep 17 00:00:00 2001 From: Baptiste Assmann Date: Tue, 30 Jan 2018 08:08:04 +0100 Subject: [PATCH 2/4] MINOR: dns: move callback affection in dns_link_resolution() In dns.c, dns_link_resolution(), each type of dns requester is managed separately, that said, the callback function is affected globaly (and points to server type callbacks only). This design prevents the addition of new dns requester type and this patch aims at fixing this limitation: now, the callback setting is done directly into the portion of code dedicated to each requester type. --- src/dns.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/dns.c b/src/dns.c index f39f3ff..8ac5024 100644 --- a/src/dns.c +++ b/src/dns.c @@ -1397,6 +1397,9 @@ int dns_link_resolution(void *requester, int requester_type, int requester_locke req = srv->dns_requester; if (!requester_locked) HA_SPIN_UNLOCK(SERVER_LOCK, &srv->lock); + + req->requester_cb = snr_resolution_cb; + req->requester_error_cb = snr_resolution_error_cb; } else if (srvrq) { if (srvrq->dns_requester == NULL) { @@ -1407,13 +1410,14 @@ int dns_link_resolution(void *requester, int requester_type, int requester_locke } else req = srvrq->dns_requester; + + req->requester_cb = snr_resolution_cb; + req->requester_error_cb = snr_resolution_error_cb; } else goto err; req->resolution = res; - req->requester_cb = snr_resolution_cb; - req->requester_error_cb = snr_resolution_error_cb; LIST_ADDQ(&res->requesters, &req->list); return 0; -- 2.7.4 From 077ea8af588e0f0ac2ac4070d514e27c6dac57c9 Mon Sep 17 00:00:00 2001 From: Baptiste Assmann Date: Mon, 21 Jan 2019 08:34:50 +0100 Subject: [PATCH 4/4] MINOR: action: new 'http-request do-resolve' action The 'do-resolve' action is an http-request action which allows to run DNS resolution at run time in HAProxy. The name to be resolved can be picked up in the request sent by the client and the result of the resolution is stored in a variable. The time the resolution is being performed, the request is on pause. If the resolution can't provide a suitable result, then the variable will be empty. It's up to the admin to take decisions based on this statement (return 503 to prevent loops). Read carefully the documentation concerning this feature, to ensure your setup is secure and safe to be used in production. --- do
Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.
śr., 23 sty 2019 o 10:41 Lukas Tribus napisał(a): > > I tested all my servers and I've noticed that nginx is broken too. I > > am running nginx 1.14.2 with OpenSSL 1.1.1a The nginx source contains > > exactly the same function as haproxy: > > https://trac.nginx.org/nginx/browser/nginx/src/event/ngx_event_openssl.c?rev=ebf8c9686b8ce7428f975d8a567935ea3722da70#L850 > > > > However, it seems that it might have been fixed in 1.15.2 by this commit: > > https://trac.nginx.org/nginx/changeset/e3ba4026c02d2c1810fd6f2cecf499fc39dde5ee/nginx/src/event/ngx_event_openssl.c > > Thanks for this. It's actually nginx 1.15.4 (September 2018) where > this commit is present. Yes, typed too fast ;) > Are nginx folks aware of the problem? It would probably be wise for > them to backport the fix to their 1.14 tree ... 1.14.2 is current version in Debian testing. Debian seems reluctant to use "mainline" nginx versions (1.15.x) so 1.14.x might end in Debian 10. I'll try to file Debian bug report later today. -- Janusz Dziemidowicz
Re: Rate-limit relating to the healthy servers count
Hi, On Wed, Jan 23, Thomas Hilaire wrote: > Hi, > > I want to implement a rate-limit system using the sticky table of > HAProxy. Consider that I have 100 servers, and a limit of 10 > requests per server, the ACL would be: > > http-request track-sc0 int(1) table GlobalRequestsTracker > http-request deny deny_status 429 if { > sc0_http_req_rate(GlobalRequestsTracker),div(100) gt 10 } > > Now if I want to make this dynamic depending on the healthy servers > count, I need to replace the hardcoded `100` per the `nbsrv` > converter like this: > > http-request track-sc0 int(1) table GlobalRequestsTracker > http-request deny deny_status 429 if { > sc0_http_req_rate(GlobalRequestsTracker),div(nbsrv(MyBackend)) gt 10 > } > > But I'm getting the error: > > error detected while parsing an 'http-request deny' condition : > invalid args in converter 'div' : expects an integer or a variable > name in ACL expression > 'sc0_http_req_rate(GlobalRequestsTracker),div(nbsrv(MyBackend))'. > > Is there a way to use `nbsrv` as a variable inside the `div` operator? Untested: does something like this work: http-request set-var(req.dummy) nbsrv(GlobalRequestsTracker) http-request deny deny_status 429 if { sc0_http_req_rate(GlobalRequestsTracker),div(req.dummy) gt 10 } -Jarno -- Jarno Huuskonen
Re: H2 Server Connection Resets (1.9.2)
Hi Willy, > When using "http-reuse always" the issue disappears and I > can never get any issue at all. Now that I've fixed this, I'm seeing the > issue with the SD flags. Now that's interesting. We were using http-reuse always and experiencing this issue (as well as getting 80+% connection reuse). When I scaled it back to http-reuse safe, the frequency of this issue seemed to be much lower. (Perhaps because the bulk of my testing was with one client and somewhat unscientific?) > Thus it > definitely is a matter of bad interaction between two streams, or one > stream affecting the connection and hurting the other stream. My debugging spidery-sense points to the same thing. Let me know if you want me to share our config (it's quite complex) with you privately or if there's anything else we can do to assist. > I now have something to dig into. :-) Best, Luke — Luke Seelenbinder Stadia Maps | Founder stadiamaps.com ‐‐‐ Original Message ‐‐‐ On Wednesday, January 23, 2019 11:39 AM, Willy Tarreau wrote: > On Wed, Jan 23, 2019 at 11:09:53AM +0100, Willy Tarreau wrote: > > > On Wed, Jan 23, 2019 at 09:24:19AM +, Luke Seelenbinder wrote: > > > > > > I've place an nginx instance after my local haproxy dev config, and > > > > found something which might explain what you're observing : the process > > > > apparently leaks FDs and fails once in a while, causing 500 to be > > > > returned : > > > > > > That's fascinating. I would have thought nginx would have had a bit better > > > care given to things like that. . . > > > > Well, it's possible I'm hitting a corner case. I don't want to blame nginx > > for such situations, we all have our share of crap when it comes to error > > handling :-) > > Actually I have to stand corrected, the issue is with our idle connection > management. For some reason we pile up new connections instead of reusing > the previous ones and the nginx process fails to stand extra ones past a > certain point. When using "http-reuse always" the issue disappears and I > can never get any issue at all. Now that I've fixed this, I'm seeing the > issue with the SD flags. I don't have this one in the specific case where > I only have one client at a time, though there's still some reuse. Thus it > definitely is a matter of bad interaction between two streams, or one > stream affecting the connection and hurting the other stream. > > I now have something to dig into. > > Thanks, > Willy publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.
On Wed, Jan 23, 2019 at 10:40:09AM +0100, Lukas Tribus wrote: > Also, we need a big fat warning that all TLSv1.3 users must upgrade in > the next 1.8 and 1.9 stable version announcement containing this fix. That's a good point, this will also encourage distro maintainers to update their versions. > I have filed a tracking bug for this, which can be closed when backported: > https://github.com/haproxy/haproxy/issues/24 > > Closed or not, the tracking bug makes this easier to find. Thanks! Willy
Re: H2 Server Connection Resets (1.9.2)
On Wed, Jan 23, 2019 at 11:09:53AM +0100, Willy Tarreau wrote: > On Wed, Jan 23, 2019 at 09:24:19AM +, Luke Seelenbinder wrote: > > > I've place an nginx instance after my local haproxy dev config, and > > > found something which might explain what you're observing : the process > > > apparently leaks FDs and fails once in a while, causing 500 to be > > > returned : > > > > That's fascinating. I would have thought nginx would have had a bit better > > care given to things like that. . . > > Well, it's possible I'm hitting a corner case. I don't want to blame nginx > for such situations, we all have our share of crap when it comes to error > handling :-) Actually I have to stand corrected, the issue is with our idle connection management. For some reason we pile up new connections instead of reusing the previous ones and the nginx process fails to stand extra ones past a certain point. When using "http-reuse always" the issue disappears and I can never get any issue at all. Now that I've fixed this, I'm seeing the issue with the SD flags. I don't have this one in the specific case where I only have one client at a time, though there's still some reuse. Thus it definitely is a matter of bad interaction between two streams, or one stream affecting the connection and hurting the other stream. I now have something to dig into. Thanks, Willy
Rate-limit relating to the healthy servers count
Hi, I want to implement a rate-limit system using the sticky table of HAProxy. Consider that I have 100 servers, and a limit of 10 requests per server, the ACL would be: http-request track-sc0 int(1) table GlobalRequestsTracker http-request deny deny_status 429 if { sc0_http_req_rate(GlobalRequestsTracker),div(100) gt 10 } Now if I want to make this dynamic depending on the healthy servers count, I need to replace the hardcoded `100` per the `nbsrv` converter like this: http-request track-sc0 int(1) table GlobalRequestsTracker http-request deny deny_status 429 if { sc0_http_req_rate(GlobalRequestsTracker),div(nbsrv(MyBackend)) gt 10 } But I'm getting the error: error detected while parsing an 'http-request deny' condition : invalid args in converter 'div' : expects an integer or a variable name in ACL expression 'sc0_http_req_rate(GlobalRequestsTracker),div(nbsrv(MyBackend))'. Is there a way to use `nbsrv` as a variable inside the `div` operator? Thanks a lot!
Re: H2 Server Connection Resets (1.9.2)
On Wed, Jan 23, 2019 at 09:24:19AM +, Luke Seelenbinder wrote: > > I've place an nginx instance after my local haproxy dev config, and > > found something which might explain what you're observing : the process > > apparently leaks FDs and fails once in a while, causing 500 to be returned : > > That's fascinating. I would have thought nginx would have had a bit better > care given to things like that. . . Well, it's possible I'm hitting a corner case. I don't want to blame nginx for such situations, we all have our share of crap when it comes to error handling :-) > Oddly enough, I cannot find any log entries that approximate this. However, > it's possible since we're primarily (99+%) using nginx as a reverse-proxy > that the fd issues wouldn't appear for us. OK, I just deployed it with the default config and added "http2" at the end of the "listen :443 ssl" line. > My next thought is to try tcpdump to try to determine what's on the wire when > the CD-- and SD-- pairs appear, but since our stack is SSL e2e, that might > prove difficult. Any suggestions? For me it's a pain when there's SSL in the mix. Some here on the list are used to know how to extract the master key and use it to decipher the traffic, but I don't know how to do this. As an alternative, nginx supports H2 in clear, so if there's a path where you can disable SSL (e.g. on hosts where haproxy and nginx are on the same machine), then you can communicate in clear by having "proto h2" on the haproxy's server line and "http2" in the nginx config, both without the "ssl" keyword. > One more interesting piece of data: if we use htx without h2 on the backends, > we only see CD-- entries consistently (with a very, very few SD-- entries). > Thus, it would seem whatever is causing the issue is directly related to h2 > backends. I further think we can safely say it is directly related to h2 > streams breaking (due to client-side request cancellations) resulting in the > whole connection breaking in HAProxy or nginx (though determining which will > be the trick). I'm pretty sure you found an issue in haproxy related to the way these requests are aborted. It's just that trying to reproduce this, I'm first hitting other issues in the way related to the limitations above :-) > There's also a strong possibility we replace nginx with HAProxy entirely for > our SSL + H2 setup as we overhaul the backends, so this problem will probably > be resolved by removing the problematic interaction. One more reason for me to speed up figuring what's happening before it becomes too hard to reproduce it! > I'm still working on running h2load against our nginx servers to see if that > turns anything up. Great, thanks! > > And at this point the connection is closed and reopened for new requests. > > There's never any GOAWAY sent. > > If I'm understanding this correctly, that implies as long as nginx sends > GOAWAY properly, HAProxy will not attempt to reuse the connection? I've discovered that it's not the case contrary to what I thought (I have the patch for this, still just testing it). That's how I ended up finding all this mess, because my nginx never sends me a GOAWAY and sees failures before. > > I managed to work around the problem by limiting the number of total > > requests per connection. I find this extremely dirty but if it helps... > > I just need to figure how to best do it, so that we can use it as well > > for H2 as for H1. > > We're pretty satisfied with our h2 fe <-> be h1.1 setup right now, so we will > probably stick with that for now, since we don't want to have any more > operational issues from bleeding-edge bugs. (Not a comment on HAProxy, per > se, just a business reality. :-) ) I'm more than happy to try out anything > you turn up on our staging setup! You're absolutely right on this and don't need to justify your choices. For us having H2 on the backend is only a matter of completeness. While it does make sense for those deploying CDNs for example, or those dealing with APIs, on the local network it doesn't bring any real benefit and further increases the risk of head-of-line blocking due to the shared connection. And it indeed increases the risk of facing early bugs in products. Both haproxy's and nginx's HTTP/1 stacks are proven and rock solid, so you're clearly taking less risks with this. Regards, Willy
Re: H2 Server Connection Resets (1.9.2)
Hi Lukas. Am 23.01.2019 um 10:24 schrieb Luke Seelenbinder: > Hi Willy, > > Thanks for continuing to look into this. > >> > >> I've place an nginx instance after my local haproxy dev config, and >> found something which might explain what you're observing : the process >> apparently leaks FDs and fails once in a while, causing 500 to be returned : > > That's fascinating. I would have thought nginx would have had a bit better > care given to things like that. . . This can be fixed with increasing the ulimits ;-). > Oddly enough, I cannot find any log entries that approximate this. However, > it's possible since we're primarily (99+%) using nginx as a reverse-proxy > that the fd issues wouldn't appear for us. What's your ulimit for nginx process? > My next thought is to try tcpdump to try to determine what's on the wire when > the CD-- and SD-- pairs appear, but since our stack is SSL e2e, that might > prove difficult. Any suggestions? If you have enough log space you can try to activate debug log in nginx and haproxy. https://nginx.org/en/docs/debugging_log.html https://cbonte.github.io/haproxy-dconv/1.9/configuration.html#log => debug This will have some impacts on the performance as every request creates a lot of loglines! It would be interesting which error you have in the nginx log when the CD/SD happen as the 'http2 flood detected' is not in the logs. Which release of nginx do you use? http://hg.nginx.org/nginx/tags Maybe there are some errors in the log which can be found in this directory. http://hg.nginx.org/nginx/file/release-1.15.8/src/http/v2/ > One more interesting piece of data: if we use htx without h2 on the backends, > we only see CD-- entries consistently (with a very, very few SD-- entries). > Thus, it would seem whatever is causing the issue is directly related to h2 > backends. I further think we can safely say it is directly related to h2 > streams breaking (due to client-side request cancellations) resulting in the > whole connection breaking in HAProxy or nginx (though determining which will > be the trick). > > There's also a strong possibility we replace nginx with HAProxy entirely for > our SSL + H2 setup as we overhaul the backends, so this problem will probably > be resolved by removing the problematic interaction. What was the main reason to use the nginx between the haproxy and backends? What's the backends? Regards Aleks > I'm still working on running h2load against our nginx servers to see if that > turns anything up. > >> And at this point the connection is closed and reopened for new requests. >> There's never any GOAWAY sent. > > If I'm understanding this correctly, that implies as long as nginx sends > GOAWAY properly, HAProxy will not attempt to reuse the connection? > >> I managed to work around the problem by limiting the number of total >> requests per connection. I find this extremely dirty but if it helps... >> I just need to figure how to best do it, so that we can use it as well >> for H2 as for H1. > > We're pretty satisfied with our h2 fe <-> be h1.1 setup right now, so we will > probably stick with that for now, since we don't want to have any more > operational issues from bleeding-edge bugs. (Not a comment on HAProxy, per > se, just a business reality. :-) ) I'm more than happy to try out anything > you turn up on our staging setup! > > Best, > Luke > > > — > Luke Seelenbinder > Stadia Maps | Founder > stadiamaps.com > > ‐‐‐ Original Message ‐‐‐ > On Wednesday, January 23, 2019 8:28 AM, Willy Tarreau wrote: > >> Hi Luke, >> > >> I've place an nginx instance after my local haproxy dev config, and >> found something which might explain what you're observing : the process >> apparently leaks FDs and fails once in a while, causing 500 to be returned : >> > >> 2019/01/23 08:22:13 [crit] 25508#0: *36705 open() >> "/usr/local/nginx/html/index.html" failed (24: Too many open files), client: >> 1> >> 2019/01/23 08:22:13 [crit] 25508#0: accept4() failed (24: Too many open >> files) >> > >> 127.0.0.1 - - [23/Jan/2019:08:22:13 +0100] "GET / HTTP/2.0" 500 579 "-" >> "Mozilla/4.0 (compatible; MSIE 7.01; Windows)" >> > >> The ones are seen by haproxy : >> > >> 127.0.0.1:47098 [23/Jan/2019:08:22:13.589] decrypt trace/ngx 0/0/0/0/0 500 >> 701 - - 1/1/0/0/0 0/0 "GET / HTTP/1.1" >> > >> And at this point the connection is closed and reopened for new requests. >> There's never any GOAWAY sent. >> > >> I managed to work around the problem by limiting the number of total >> requests per connection. I find this extremely dirty but if it helps... >> I just need to figure how to best do it, so that we can use it as well >> for H2 as for H1. >> > >> Best regards, >> Willy >
Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.
On Wed, 23 Jan 2019 at 09:52, Willy Tarreau wrote: > > On Wed, Jan 23, 2019 at 12:07:04AM -0800, Dirkjan Bussink wrote: > > Of course, you're right. New version of the patch attached! > > Now merged, thank you! It's obvious, but because the commit message doesn't not explicitly mention it: This must be backported to 1.8. Also, we need a big fat warning that all TLSv1.3 users must upgrade in the next 1.8 and 1.9 stable version announcement containing this fix. I have filed a tracking bug for this, which can be closed when backported: https://github.com/haproxy/haproxy/issues/24 Closed or not, the tracking bug makes this easier to find. > I tested all my servers and I've noticed that nginx is broken too. I > am running nginx 1.14.2 with OpenSSL 1.1.1a The nginx source contains > exactly the same function as haproxy: > https://trac.nginx.org/nginx/browser/nginx/src/event/ngx_event_openssl.c?rev=ebf8c9686b8ce7428f975d8a567935ea3722da70#L850 > > However, it seems that it might have been fixed in 1.15.2 by this commit: > https://trac.nginx.org/nginx/changeset/e3ba4026c02d2c1810fd6f2cecf499fc39dde5ee/nginx/src/event/ngx_event_openssl.c Thanks for this. It's actually nginx 1.15.4 (September 2018) where this commit is present. Are nginx folks aware of the problem? It would probably be wise for them to backport the fix to their 1.14 tree ... > And just for reference, I've found Chrome bug with this problem (as I > am interested when this will get enabled to keep all my systems > updated) https://bugs.chromium.org/p/chromium/issues/detail?id=923685 Thanks, will subscribe to this bug also. Regards, Lukas
Re: H2 Server Connection Resets (1.9.2)
Hi Willy, Thanks for continuing to look into this. > > I've place an nginx instance after my local haproxy dev config, and > found something which might explain what you're observing : the process > apparently leaks FDs and fails once in a while, causing 500 to be returned : That's fascinating. I would have thought nginx would have had a bit better care given to things like that. . . Oddly enough, I cannot find any log entries that approximate this. However, it's possible since we're primarily (99+%) using nginx as a reverse-proxy that the fd issues wouldn't appear for us. My next thought is to try tcpdump to try to determine what's on the wire when the CD-- and SD-- pairs appear, but since our stack is SSL e2e, that might prove difficult. Any suggestions? One more interesting piece of data: if we use htx without h2 on the backends, we only see CD-- entries consistently (with a very, very few SD-- entries). Thus, it would seem whatever is causing the issue is directly related to h2 backends. I further think we can safely say it is directly related to h2 streams breaking (due to client-side request cancellations) resulting in the whole connection breaking in HAProxy or nginx (though determining which will be the trick). There's also a strong possibility we replace nginx with HAProxy entirely for our SSL + H2 setup as we overhaul the backends, so this problem will probably be resolved by removing the problematic interaction. I'm still working on running h2load against our nginx servers to see if that turns anything up. > And at this point the connection is closed and reopened for new requests. > There's never any GOAWAY sent. If I'm understanding this correctly, that implies as long as nginx sends GOAWAY properly, HAProxy will not attempt to reuse the connection? > I managed to work around the problem by limiting the number of total > requests per connection. I find this extremely dirty but if it helps... > I just need to figure how to best do it, so that we can use it as well > for H2 as for H1. We're pretty satisfied with our h2 fe <-> be h1.1 setup right now, so we will probably stick with that for now, since we don't want to have any more operational issues from bleeding-edge bugs. (Not a comment on HAProxy, per se, just a business reality. :-) ) I'm more than happy to try out anything you turn up on our staging setup! Best, Luke — Luke Seelenbinder Stadia Maps | Founder stadiamaps.com ‐‐‐ Original Message ‐‐‐ On Wednesday, January 23, 2019 8:28 AM, Willy Tarreau wrote: > Hi Luke, > > I've place an nginx instance after my local haproxy dev config, and > found something which might explain what you're observing : the process > apparently leaks FDs and fails once in a while, causing 500 to be returned : > > 2019/01/23 08:22:13 [crit] 25508#0: *36705 open() > "/usr/local/nginx/html/index.html" failed (24: Too many open files), client: > 1> > 2019/01/23 08:22:13 [crit] 25508#0: accept4() failed (24: Too many open files) > > 127.0.0.1 - - [23/Jan/2019:08:22:13 +0100] "GET / HTTP/2.0" 500 579 "-" > "Mozilla/4.0 (compatible; MSIE 7.01; Windows)" > > The ones are seen by haproxy : > > 127.0.0.1:47098 [23/Jan/2019:08:22:13.589] decrypt trace/ngx 0/0/0/0/0 500 > 701 - - 1/1/0/0/0 0/0 "GET / HTTP/1.1" > > And at this point the connection is closed and reopened for new requests. > There's never any GOAWAY sent. > > I managed to work around the problem by limiting the number of total > requests per connection. I find this extremely dirty but if it helps... > I just need to figure how to best do it, so that we can use it as well > for H2 as for H1. > > Best regards, > Willy publickey - luke.seelenbinder@stadiamaps.com - 0xB23C1E8A.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature
Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.
On Wed, Jan 23, 2019 at 12:07:04AM -0800, Dirkjan Bussink wrote: > Of course, you're right. New version of the patch attached! Now merged, thank you! Willy
Re: HAProxy with OpenSSL 1.1.1 breaks when TLS 1.3 KeyUpdate is used.
Hi Willy, On 22 Jan 2019, at 23:17, Willy Tarreau wrote: > > As you can see it will enable this code when SSL_OP_NO_RENEGOTIATION=0, > which is what BoringSSL does and it needs this code to be disabled. Thus > I think it's better to simply do this : > > +#ifndef SSL_OP_NO_RENEGOTIATION > + /* Please note that BoringSSL defines this macro to zero so don't > + * change this to #if and do not assign a default value to this macro! > + */ > Of course, you’re right. New version of the patch attached! Cheers, Dirkjan 0001-BUG-MEDIUM-ssl-Fix-handling-of-TLS-1.3-KeyUpdate-mes.patch Description: Binary data