Hi Willy,
On Fri, Oct 04, Willy Tarreau wrote:
> Hi Jarno,
>
> On Wed, Oct 02, 2019 at 01:08:14PM +0300, Jarno Huuskonen wrote:
> > Hello,
> >
> > I was testing haproxy -> uwsgi(alert.io) and noticed a possible regression
> > with healthchecks(httpchk).
> > With 1.9.9 uwsgi logs:
> > [uwsgi-http key: host.name.fi client_addr: 127.0.0.1 client_port: 45715]
> > hr_read(): Connection reset by peer [plugins/http/http.c line 917]
> >
> > health checks work
> > (option httpchk GET /_ HTTP/1.1\r\nHost:\ host.name.fi\r\nUser-Agent:\
> > haproxy)
> > but uwsgi logs the hr_read() warning/error.
> >
> > I bisected 1.9.9 and this commit is probably the commit that changes
> > behaviour between 1.9.8 and 1.9.9:
> > 5d0cb90eb78f869e8801b34eddfdfd5dd8360e71 is the first bad commit
> > commit 5d0cb90eb78f869e8801b34eddfdfd5dd8360e71
> > Author: Olivier Houchard
> > Date: Fri Jun 14 15:26:06 2019 +0200
> >
> > BUG/MEDIUM: connections: Don't call shutdown() if we want to disable
> > linger.
> >
> > In conn_sock_shutw(), avoid calling shutdown() if linger_risk is set.
> > Not
> > doing so will result in getting sockets in TIME_WAIT for some time.
> > This is particularly observable with health checks.
> >
> > This should be backported to 1.9.
> >
> > (cherry picked from commit fe4abe62c7c5206dff1802f42d17014e198b9141)
> > Signed-off-by: Christopher Faulet
>
> Hmmm that's annoying, really, because we've opened a huge can of worms
> when fixing the first failed check and we're constantly displacing the
> problem somewhere else :-/
>
> Yes, please do provide an strace, and a tcpdump, that would be nice.
> I suspect that we'll possibly see a FIN from the server, without the
> equivalent recv()==0 in haproxy, and that the setsockopt() call
> resulting in the RST being sent doesn't ack the FIN. Normally we
> should perform a clean shutdown+close if the FIN was received with
> the response and detected in time.
I sent pcap/strace offlist.
(strace -f -o -ttt, tcpdump -n -p -s 16384 -w ... host 127.0.0.1 and
port 8080).
I think in packet capture the second health checks causes
"client_addr: 127.0.0.1 client_port: 2779] hr_read(): Connection reset by peer
[plugins/http/http.c line 917]"
(I think uswgi logs client_port incorrectly, ntohs(2779) gives 56074
(and port 56074 is in packet capture)).
(haproxy version: HA-Proxy version 2.1-dev2 2019/10/01).
I tried to reproduce with very minimal flask/uwsgi hello world app
and there hr_read happens very rarely.
With alerta(.io) app this happens more regularly (AFAIK not with every check).
So maybe this is weird timing issue or bug in uwsgi.
> I'm seeing that your check request doesn't contain "connection: close",
> so actually it's possible that your server doesn't send the SYN, in
> which case we really need to close with RST. Could you please try to
> add "connection: close" to your httpchk line ?
I had Connection: close, but removed it after I added
http-check expect string OK
(http-check expect adds Connection: close).
-Jarno
--
Jarno Huuskonen