Re: proposing a haproxy 2.0.16 release (was [BUG] haproxy retries dispatch to wrong server)

2020-07-15 Thread Christopher Faulet




Am 11.07.20 um 00:03 schrieb Tim Düsterhus:

Lukas,

Am 10.07.20 um 21:04 schrieb Lukas Tribus:

I finally pushed this fix in the 2.0. Note the same bug affected the HTTP proxy
mode (using http_proxy option). In this case, the connection retries is now
disabled (on the 2.0 only) because the destination address is definitely lost.
It was the easiest way to work around the bug without backporting a bunch of
sensitive patches from the 2.1.


Given the importance and impact of this bug you just fixed (at least 5
independent people already reported it on GH and ML) and the amount of
other important fixes already in the tree (including at least one
crash fix), I'm suggesting to release 2.0.16. Unless there are other
important open bugs with ongoing troubleshooting?




Agreed. With 2.2 being released this is a good opportunity to release
updates for all branches down to 1.9 at least to properly mark the
latter as EoL.

I'd also appreciate at least backports happening without a release for
1.6 and 1.7, allowing us to close off old, fixed, issues on GitHub.


Bumping this.



Fully agree. I will try to emit the 2.0.16 this Friday. For the backports in 
older versions, it is probably a good idea too. We just need some time to do so. 
No promise on this point :)


--
Christopher Faulet



Re: proposing a haproxy 2.0.16 release (was [BUG] haproxy retries dispatch to wrong server)

2020-07-15 Thread Tim Düsterhus
William,
Christopher,

Am 11.07.20 um 00:03 schrieb Tim Düsterhus:
> Lukas,
> 
> Am 10.07.20 um 21:04 schrieb Lukas Tribus:
>>> I finally pushed this fix in the 2.0. Note the same bug affected the HTTP 
>>> proxy
>>> mode (using http_proxy option). In this case, the connection retries is now
>>> disabled (on the 2.0 only) because the destination address is definitely 
>>> lost.
>>> It was the easiest way to work around the bug without backporting a bunch of
>>> sensitive patches from the 2.1.
>>
>> Given the importance and impact of this bug you just fixed (at least 5
>> independent people already reported it on GH and ML) and the amount of
>> other important fixes already in the tree (including at least one
>> crash fix), I'm suggesting to release 2.0.16. Unless there are other
>> important open bugs with ongoing troubleshooting?
>>
>>
> 
> Agreed. With 2.2 being released this is a good opportunity to release
> updates for all branches down to 1.9 at least to properly mark the
> latter as EoL.
> 
> I'd also appreciate at least backports happening without a release for
> 1.6 and 1.7, allowing us to close off old, fixed, issues on GitHub.

Bumping this.

Best regards
Tim Düsterhus



Re: proposing a haproxy 2.0.16 release (was [BUG] haproxy retries dispatch to wrong server)

2020-07-10 Thread Tim Düsterhus
Lukas,

Am 10.07.20 um 21:04 schrieb Lukas Tribus:
>> I finally pushed this fix in the 2.0. Note the same bug affected the HTTP 
>> proxy
>> mode (using http_proxy option). In this case, the connection retries is now
>> disabled (on the 2.0 only) because the destination address is definitely 
>> lost.
>> It was the easiest way to work around the bug without backporting a bunch of
>> sensitive patches from the 2.1.
> 
> Given the importance and impact of this bug you just fixed (at least 5
> independent people already reported it on GH and ML) and the amount of
> other important fixes already in the tree (including at least one
> crash fix), I'm suggesting to release 2.0.16. Unless there are other
> important open bugs with ongoing troubleshooting?
> 
> 

Agreed. With 2.2 being released this is a good opportunity to release
updates for all branches down to 1.9 at least to properly mark the
latter as EoL.

I'd also appreciate at least backports happening without a release for
1.6 and 1.7, allowing us to close off old, fixed, issues on GitHub.

Best regards
Tim Düsterhus



proposing a haproxy 2.0.16 release (was [BUG] haproxy retries dispatch to wrong server)

2020-07-10 Thread Lukas Tribus
Hello,


On Fri, 10 Jul 2020 at 08:08, Christopher Faulet  wrote:
> Hi,
>
> I finally pushed this fix in the 2.0. Note the same bug affected the HTTP 
> proxy
> mode (using http_proxy option). In this case, the connection retries is now
> disabled (on the 2.0 only) because the destination address is definitely lost.
> It was the easiest way to work around the bug without backporting a bunch of
> sensitive patches from the 2.1.

Given the importance and impact of this bug you just fixed (at least 5
independent people already reported it on GH and ML) and the amount of
other important fixes already in the tree (including at least one
crash fix), I'm suggesting to release 2.0.16. Unless there are other
important open bugs with ongoing troubleshooting?

lukas@dev:~/haproxy-2.0$ git log --oneline v2.0.15..
d982a8e BUG/MEDIUM: stream-int: Disable connection retries on plain
HTTP proxy mode
e8d2423 BUG/MAJOR: stream: Mark the server address as unset on new
outgoing connection
b1e9407 MINOR: http: Add support for http 413 status
c3db7c1 BUG/MINOR: backend: Remove CO_FL_SESS_IDLE if a client remains
on the last server
0d881b2 BUG/MEDIUM: connection: Continue to recv data to a pipe when
the FD is not ready
847271d MINOR: connection: move the CO_FL_WAIT_ROOM cleanup to the reader only
39bb227 BUG/MEDIUM: mux-h1: Subscribe rather than waking up in h1_rcv_buf()
e0ca6ad BUG/MEDIUM: mux-h1: Disable splicing for the conn-stream if
read0 is received
0528ae2 BUG/MINOR: mux-h1: Disable splicing only if input data was processed
8e8168a BUG/MINOR: mux-h1: Don't read data from a pipe if the mux is
unable to receive
afadc9a BUG/MINOR: mux-h1: Fix the splicing in TUNNEL mode
8e4e357 BUG/MINOR: http_act: don't check capture id in backend (2)
c55e3e1 DOC: configuration: fix alphabetical ordering for
tune.pool-{high,low}-fd-ratio
a5e11c0 DOC: configuration: add missing index entries for
tune.pool-{low,high}-fd-ratio
ab06f88 BUG/MINOR: proxy: always initialize the trash in show servers state
ca212e5 BUG/MINOR: proxy: fix dump_server_state()'s misuse of the trash
135899e BUG/MEDIUM: pattern: Add a trailing \0 to match strings only if possible
0b77c18 DOC: ssl: add "allow-0rtt" and "ciphersuites" in crt-list
4271c77 MINOR: cli: make "show sess" stop at the last known session
8ba978b BUG/MEDIUM: fetch: Fix hdr_ip misparsing IPv4 addresses due to
missing NUL
9bd736c REGTEST: ssl: add some ssl_c_* sample fetches test
15080cb REGTEST: ssl: tests the ssl_f_* sample fetches
d6cd2b3 MINOR: spoe: Don't systematically create new applets if
processing rate is low
1b4cc2e BUG/MINOR: http_ana: clarify connection pointer check on L7 retry
d995d5f BUG/MINOR: spoe: correction of setting bits for analyzer
26e1841 REGTEST: Add a simple script to tests errorfile directives in
proxy sections
8645299 BUG/MINOR: systemd: Wait for network to be online
b88a37c MEDIUM: map: make the "clear map" operation yield
c5034a3 REGTEST: http-rules: test spaces in ACLs with master CLI
4cdff8b REGTEST: http-rules: test spaces in ACLs
c3a2e35 BUG/MINOR: mworker/cli: fix semicolon escaping in master CLI
da9a2d1 BUG/MINOR: mworker/cli: fix the escaping in the master CLI
7ed43aa BUG/MINOR: cli: allow space escaping on the CLI
249346d BUG/MINOR: spoe: add missing key length check before checking key names
1b7f58f BUG/MEDIUM: ebtree: use a byte-per-byte memcmp() to compare
memory blocks
47a5600 BUG/MINOR: tcp-rules: tcp-response must check the buffer's fullness
9f3bda0 MINOR: http: Add 404 to http-request deny
c09f797 MINOR: http: Add 410 to http-request deny
lukas@dev:~/haproxy-2.0$



Thanks,

Lukas



Re: [BUG] haproxy retries dispatch to wrong server

2020-07-10 Thread Christopher Faulet

Le 07/07/2020 à 23:02, Christopher Faulet a écrit :

Le 07/07/2020 à 15:16, Michael Wimmesberger a écrit :

Hi,

I might have found a potentially critical bug in haproxy. It occurs when
haproxy is retrying to dispatch a request to a server. If haproxy fails
to dispatch a request to a server that is either up or has no health
checks enabled it dispatches the request to a random server on any
backend in any mode (tcp or http) as long as they are in the up state
(via tcp-connect or httpchk health checks). In addition haproxy logs the
correct server although it dispatches the request to a wrong server.



Hi Michael,

Thanks for the reproducer and the detailed description. I'm able to reproduce
the bug thanks to it. I attached a patch to address it. I will see with Willy
tomorrow morning if it is the good way to fix it. But it should do the trick.



Hi,

I finally pushed this fix in the 2.0. Note the same bug affected the HTTP proxy 
mode (using http_proxy option). In this case, the connection retries is now 
disabled (on the 2.0 only) because the destination address is definitely lost. 
It was the easiest way to work around the bug without backporting a bunch of 
sensitive patches from the 2.1.


Thanks again Michael. You're bug report was really helpful.

--
Christopher Faulet



Re: [BUG] haproxy retries dispatch to wrong server

2020-07-07 Thread Christopher Faulet

Le 07/07/2020 à 15:16, Michael Wimmesberger a écrit :

Hi,

I might have found a potentially critical bug in haproxy. It occurs when
haproxy is retrying to dispatch a request to a server. If haproxy fails
to dispatch a request to a server that is either up or has no health
checks enabled it dispatches the request to a random server on any
backend in any mode (tcp or http) as long as they are in the up state
(via tcp-connect or httpchk health checks). In addition haproxy logs the
correct server although it dispatches the request to a wrong server.



Hi Michael,

Thanks for the reproducer and the detailed description. I'm able to reproduce 
the bug thanks to it. I attached a patch to address it. I will see with Willy 
tomorrow morning if it is the good way to fix it. But it should do the trick.


Thanks again !

--
Christopher Faulet
>From bbc89e04a252b3719f221cd0afbad49f507c4c46 Mon Sep 17 00:00:00 2001
From: Christopher Faulet 
Date: Tue, 7 Jul 2020 22:25:19 +0200
Subject: [PATCH] BUG/MAJOR: stream: Mark the server address as unset on new
 outgoing connection

In connect_server() function, when a new connection is created, it is important
to mark the server address as explicitly unset, removing the SF_ADDR_SET
stream's flag, to be sure to set it. On the first connect attempt, it is not a
problem because the flag is not set. But on a connection failure, the faulty
endpoint is detached. It is specific to the 2.0. See the commit 7b69c91e7
("BUG/MAJOR: stream-int: always detach a faulty endpoint on connect failure")
for details. As a result of this commit, on a connect retry, a new connection is
created. But this time, the SF_ADDR_SET flag is set (except on a
redispatch). But in reality, because it is a new connection, the server address
is not set.

On the other end, when a connection is released (or when it is created), the
from/to addresses are not cleared. Thus because of the described bug, when a
connection is get from the memory pool, the addresses of the previous connection
is used. leading to undefined and random routing. For a totally new connection,
no addresses are set and an internal error is reported by si_connect().

A reproducer with a detailed description was posted on the ML:

  https://www.mail-archive.com/haproxy@formilux.org/msg37850.html

This patch should fix the issue #717. There is no direct mainline commit ID for
this fix, and it must not be backported as it's solely for 2.0.
---
 src/backend.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend.c b/src/backend.c
index d0d11779a..2d0fd6a43 100644
--- a/src/backend.c
+++ b/src/backend.c
@@ -1457,6 +1457,7 @@ int connect_server(struct stream *s)
 
 	/* no reuse or failed to reuse the connection above, pick a new one */
 	if (!srv_conn) {
+		s->flags &= ~SF_ADDR_SET;
 		srv_conn = conn_new();
 		if (srv_conn)
 			srv_conn->target = s->target;
-- 
2.26.2



Re: [BUG] haproxy retries dispatch to wrong server

2020-07-07 Thread Lukas Tribus
Hello Michael,


On Tue, 7 Jul 2020 at 15:16, Michael Wimmesberger
 wrote:
>
> Hi,
>
> I might have found a potentially critical bug in haproxy. It occurs when
> haproxy is retrying to dispatch a request to a server. If haproxy fails
> to dispatch a request to a server that is either up or has no health
> checks enabled it dispatches the request to a random server on any
> backend in any mode (tcp or http) as long as they are in the up state
> (via tcp-connect or httpchk health checks). In addition haproxy logs the
> correct server although it dispatches the request to a wrong server.
>
> I could not reproduce this issue on 2.0.14 or any 2.1.x version. This
> happens in tcp and http mode and http requests might be dispatched to
> tcp servers and vice versa.
>
> I have tried to narrow this problem down in source using git bisect,
> which results in this commit marked as the first bad one:
> 7b69c91e7d9ac6d7513002ecd3b06c1ac3cb8297.

Makes sense that 2.1 is not affected because this commit was
specifically written for 2.0 (it's not a backport).

Exceptionally detailed and thorough reporting here, this will help a
lot, thank you.

A bug has been previously filed, but the details mentioned in this
thread will help get things going:
https://github.com/haproxy/haproxy/issues/717



Lukas



[BUG] haproxy retries dispatch to wrong server

2020-07-07 Thread Michael Wimmesberger
Hi,

I might have found a potentially critical bug in haproxy. It occurs when
haproxy is retrying to dispatch a request to a server. If haproxy fails
to dispatch a request to a server that is either up or has no health
checks enabled it dispatches the request to a random server on any
backend in any mode (tcp or http) as long as they are in the up state
(via tcp-connect or httpchk health checks). In addition haproxy logs the
correct server although it dispatches the request to a wrong server.

I could not reproduce this issue on 2.0.14 or any 2.1.x version. This
happens in tcp and http mode and http requests might be dispatched to
tcp servers and vice versa.

I have tried to narrow this problem down in source using git bisect,
which results in this commit marked as the first bad one:
7b69c91e7d9ac6d7513002ecd3b06c1ac3cb8297.


I have created a setup with a minimal config to reproduce this
unintended behavior with a high probability to occur. The odds of this
bug occuring can be increased by having more backend servers using
health checks. With 2 faulty servers without health checks and 20
servers with health checks I get about a 90-95% chance for a wrong dispatch.


reduced haproxy.cfg:
# note: replace 127.0.0.1 with the internal ip of the host running the
container, i.e. 172.17.0.1 when using
# docker or the container names when using a container-network
# make sure port 8999 is not available
defaults
  mode  http
  timeout  http-request 10s
  timeout  queue 1m
  timeout  connect 5s
  timeout  client 1m
  timeout  server 1m

frontend fe_http_in
  bind 0.0.0.0:8100
  use_backend be_bad.example.com if { req.hdr(host) bad.example.com }
  use_backend be_good.example.com if { req.hdr(host) good.example.com }

backend be_bad.example.com
  server bad.example.com_8999 127.0.0.1:8999 # make sure this port is
not bound

backend be_good.example.com
  server good.example.com_8070 127.0.0.1:8070 check

listen li_bad.example.com_tcp_39100:
  bind 0.0.0.0:39100
  mode tcp
  server bad.example.com_tcp_8999 127.0.0.1:8999 # make sure this port
is not bound

listen li_good.example.com_tcp_39200:
  bind 0.0.0.0:39200
  mode tcp
  server good.example.com_tcp_8071 127.0.0.1:8071 check

running test-webservices:
podman run -d --rm -p 8070:80 --name nginxdemo nginxdemos/hello
podman run -d --rm -p 8071:8000 --name crccheckdemo crccheck/hello-world
# note: I am running to different webservices to highlight the random
aspect for the redispatch

run haproxy inside a container:
podman run -it --rm \
  --name haproxy \
  -v "${PWD}/haproxy/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:z" \
  -p 8100:8100 \
  -p 39100:39100 \
  -p 39200:39200 \
  haproxy:2.0.15-alpine
# note: I have selinux enabled and thus require :z or :Z to mount a file
or directory into the container


testing using curl:
# expected: HTTP/1.1 503 Service Unavailable
curl -sv -o /dev/null http://bad.example.com --connect-to
::127.0.0.1:8100 2>&1 | grep HTTP/1
# expected: nothing (curl writes "Empty reply from server")
curl -sv -o /dev/null http://127.0.0.1:39100 2>&1 | grep HTTP/1

# expected: HTTP/1.1 200 OK
curl -sv -o /dev/null http://good.example.com --connect-to
::127.0.0.1:8100 2>&1 | grep HTTP/1
# expected: HTTP/1.0 200 OK
curl -sv -o /dev/null http://127.0.0.1:39200 2>&1 | grep HTTP/1


In this setup the curls which get mismatched to the wrong backend server
flip between either HTTP/1.1 when dispatched to the nginxdemos/hello,
between HTTP/1.0 when dispatched to the crccheck/hello-world or the
correct response (503 or nothing) in consecutive runs.

I have attached a simple script which recreates this small test-setup
using podman but it could fairly easily be converted to docker.


cheers,
Michael



create-setup.sh
Description: application/shellscript