Re: 2.0.14 + htx / retry-on all-retryable-errors -> sometimes wrong backend/server used
Hi Jarno, On Tue, May 19, 2020 at 03:26:22PM +, Jarno Huuskonen wrote: > Hi, > > On Tue, 2020-05-19 at 15:58 +0200, Christopher Faulet wrote: > > It was already reported on github and seems to be fixed. We are just > > waiting a > > feedback to be sure it is fixed before backporting the patch. See > > https://github.com/haproxy/haproxy/issues/623. > > > > If you try the latest 2.2 snapshot, it should be good. You may also > > try to > > cherry-pick the commit 8cabc9783 to the 2.0. > > Thanks Christopher (and Tim), I'll try with 2.2 snapshot (and/or) > 8cabc9783 and report how it goes. Note, regarding this, I'd like us to emit another set of stable versions but we've got a report of a very tricky situation involving l7 retries and HTTP reuse which can sometimes lead to a crash in versions prior to 2.0, and I'd really like to understand it fully, reproduce it and fix it. So in the mean time, it's possible that 2.2 would be more reliable than 2.0 when it comes to l7 retries. Cheers, Willy
Re: 2.0.14 + htx / retry-on all-retryable-errors -> sometimes wrong backend/server used
Hi, On Tue, 2020-05-19 at 15:58 +0200, Christopher Faulet wrote: > It was already reported on github and seems to be fixed. We are just > waiting a > feedback to be sure it is fixed before backporting the patch. See > https://github.com/haproxy/haproxy/issues/623. > > If you try the latest 2.2 snapshot, it should be good. You may also > try to > cherry-pick the commit 8cabc9783 to the 2.0. Thanks Christopher (and Tim), I'll try with 2.2 snapshot (and/or) 8cabc9783 and report how it goes. -Jarno -- Jarno Huuskonen
Re: 2.0.14 + htx / retry-on all-retryable-errors -> sometimes wrong backend/server used
Le 19/05/2020 à 15:36, Jarno Huuskonen a écrit : Hi, I think I found a case when haproxy-2.0.14 with htx and retry-on all-retryable-errors sometimes seems to select wrong backend/server to retry. (Doesn't happen on every retry). I found that sometimes when our wordpress backend gave 500 error haproxy would retry on wrong backend. Here's a very simplified redacted config, (I can provide full config off list(if needed)). wordpress frontend has http/2 enabled: frontend FE_wp bind ipv4@address-here:443 name wpv4a1 ssl crt/etc/haproxy/ssl/crt1.pem alpn h2,http/1.1 crt /etc/haproxy/ssl/crt2.pem alpn h2,http/1.1 ssl-min-ver TLSv1.2 ... use_backend %[req.hdr(Host),lower,map_dom(/path/to/wordpress_backends.map,BE_wp_blo gs)] default_backend BE_wp_blogs # This is the backend that sometimes gave 500 errors backend BE_wp_blogs2 ... retries 2 option redispatch option prefer-last-server retry-onall-retryable-errors balance roundrobin timeout connect 4500ms timeout server 40s timeout queue 4s timeout check 5s cookie cookiename insert indirect nocache httponly maxidle 20m default-server inter 15s downinter 25s rise 2 error-limit 250 on- error fail-check server name1 ip1:8443 id 1 cookie name1 check observe layer7 server name2 ip2:8443 id 2 cookie name2 check observe layer7 server name3 ip3:8443 id 3 cookie name3 check observe layer7 # These are the unrelated/wrong backend that backend wrong1 ... server diff1 different_ip:2048 id 1 cookie diff1 maxconn 500 track BE_other/ezauth1 server diff2 different_ip:2048 id 2 cookie diff2 maxconn 500 track BE_other/ezauth2 backend wrong2 ... server diffssl1 different_ip:2443 id 1 cookie diff1 maxconn 500 track BE_other/ezauth1 server diffssl2 different_ip:2443 id 2 cookie diff2 maxconn 500 track BE_other/ezauth2 All backends had servers with same numeric id's id 1-3 for wordpress and "wrong" backend servers with id's 1-2. I tried changing all backends server id's but I still sometimes get wrong backend/server. If I set no option http-use-htx or retry-on conn-failure then AFAIK(limited testing) the problem doesn't happen. I haven't managed to reproduce with simple php script that just gives 500 error, so there could be some timing that triggers this. Example haproxy log when request goes to wrong backend/server: haproxy[258199]: client-ipv6-address:42630 [19/May/2020:16:09:29.366] FE_wp~ BE_wp_blogs2/name3 0/0/89/0/89 400 134 - - --VU 1/1/0/121/2 0/0 {hostheader} "GET /wp-admin/ HTTP/2.0" 443 HTTP/2 (And the wrong backend/server (tomcat in this case) logs this: haproxy-ip-address - - [19/May/2020:16:09:29 +0300] "-" 400 - 0ms JSESSIONID=-) (400 error because haproxy sends http to tomcat https port). tshark -e text shows this for the wordpress backend 500 response that can trigger retry on wrong backend: "layers": { "http.response.code": [ "500" ], "http.response.phrase": [ "Internal Server Error" ], "text": [ "Timestamps", "HTTP/1.1 500 Internal Server Error\\r\\n", "\\r\\n", "HTTP chunked response", "Data chunk (2892 octets)", "End of chunked encoding", "\\r\\n", (I've omitted the response body). Any more tests etc. I could try to figure out what's going on ? Perhaps try with latest 2.2-dev ? Hi, It was already reported on github and seems to be fixed. We are just waiting a feedback to be sure it is fixed before backporting the patch. See https://github.com/haproxy/haproxy/issues/623. If you try the latest 2.2 snapshot, it should be good. You may also try to cherry-pick the commit 8cabc9783 to the 2.0. -- Christopher Faulet
Re: 2.0.14 + htx / retry-on all-retryable-errors -> sometimes wrong backend/server used
Jarno, Am 19.05.20 um 15:36 schrieb Jarno Huuskonen: > I think I found a case when haproxy-2.0.14 with htx and retry-on > all-retryable-errors sometimes seems to select wrong backend/server > to retry. (Doesn't happen on every retry). > I believe this is a duplicate of this issue: https://github.com/haproxy/haproxy/issues/623 Best regards Tim Düsterhus
2.0.14 + htx / retry-on all-retryable-errors -> sometimes wrong backend/server used
Hi, I think I found a case when haproxy-2.0.14 with htx and retry-on all-retryable-errors sometimes seems to select wrong backend/server to retry. (Doesn't happen on every retry). I found that sometimes when our wordpress backend gave 500 error haproxy would retry on wrong backend. Here's a very simplified redacted config, (I can provide full config off list(if needed)). wordpress frontend has http/2 enabled: frontend FE_wp bind ipv4@address-here:443 name wpv4a1 ssl crt/etc/haproxy/ssl/crt1.pem alpn h2,http/1.1 crt /etc/haproxy/ssl/crt2.pem alpn h2,http/1.1 ssl-min-ver TLSv1.2 ... use_backend %[req.hdr(Host),lower,map_dom(/path/to/wordpress_backends.map,BE_wp_blo gs)] default_backend BE_wp_blogs # This is the backend that sometimes gave 500 errors backend BE_wp_blogs2 ... retries 2 option redispatch option prefer-last-server retry-onall-retryable-errors balance roundrobin timeout connect 4500ms timeout server 40s timeout queue 4s timeout check 5s cookie cookiename insert indirect nocache httponly maxidle 20m default-server inter 15s downinter 25s rise 2 error-limit 250 on- error fail-check server name1 ip1:8443 id 1 cookie name1 check observe layer7 server name2 ip2:8443 id 2 cookie name2 check observe layer7 server name3 ip3:8443 id 3 cookie name3 check observe layer7 # These are the unrelated/wrong backend that backend wrong1 ... server diff1 different_ip:2048 id 1 cookie diff1 maxconn 500 track BE_other/ezauth1 server diff2 different_ip:2048 id 2 cookie diff2 maxconn 500 track BE_other/ezauth2 backend wrong2 ... server diffssl1 different_ip:2443 id 1 cookie diff1 maxconn 500 track BE_other/ezauth1 server diffssl2 different_ip:2443 id 2 cookie diff2 maxconn 500 track BE_other/ezauth2 All backends had servers with same numeric id's id 1-3 for wordpress and "wrong" backend servers with id's 1-2. I tried changing all backends server id's but I still sometimes get wrong backend/server. If I set no option http-use-htx or retry-on conn-failure then AFAIK(limited testing) the problem doesn't happen. I haven't managed to reproduce with simple php script that just gives 500 error, so there could be some timing that triggers this. Example haproxy log when request goes to wrong backend/server: haproxy[258199]: client-ipv6-address:42630 [19/May/2020:16:09:29.366] FE_wp~ BE_wp_blogs2/name3 0/0/89/0/89 400 134 - - --VU 1/1/0/121/2 0/0 {hostheader} "GET /wp-admin/ HTTP/2.0" 443 HTTP/2 (And the wrong backend/server (tomcat in this case) logs this: haproxy-ip-address - - [19/May/2020:16:09:29 +0300] "-" 400 - 0ms JSESSIONID=-) (400 error because haproxy sends http to tomcat https port). tshark -e text shows this for the wordpress backend 500 response that can trigger retry on wrong backend: "layers": { "http.response.code": [ "500" ], "http.response.phrase": [ "Internal Server Error" ], "text": [ "Timestamps", "HTTP/1.1 500 Internal Server Error\\r\\n", "\\r\\n", "HTTP chunked response", "Data chunk (2892 octets)", "End of chunked encoding", "\\r\\n", (I've omitted the response body). Any more tests etc. I could try to figure out what's going on ? Perhaps try with latest 2.2-dev ? -Jarno (haproxy -vv: HA-Proxy version 2.0.14 2020/04/02 - https://haproxy.org/ Build options : TARGET = linux-glibc CPU = generic CC = gcc CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno -unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style- declaration - Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limit s OPTIONS = USE_PCRE=1 USE_PCRE_JIT=1 USE_REGPARM=1 USE_GETADDRINFO=1 USE_OPENSSL= 1 USE_ZLIB=1 USE_SYSTEMD=1 Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER +PCRE +PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS Default settings : bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with multi-threading support (MAX_THREADS=64, default=4). Built with OpenSSL version : OpenSSL 1.1.1d 10 Sep 2019 Running on OpenSSL version : OpenSSL 1.1.1d 10 Sep 2019 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3 Built with network namespace support. Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Built with zlib version : 1.2.7 Running on zlib version : 1.2.7 Compression algorithms supported : identity("identity"), deflate("def