Re: Issue with HTX

2019-10-09 Thread GARDAIS Ionel
Was using 2.0.7.
Had the same issue with 2.1-dev2

It happens with the same tool but does not seem to be triggered every time.
Is it possible to enable H1+H2 debug traces on a specific backend ?

-- 
Ionel GARDAIS
Tech'Advantage CIO - IT Team manager

- Mail original -
De: "Willy Tarreau" 
À: "Ionel GARDAIS" 
Cc: "haproxy" 
Envoyé: Mercredi 9 Octobre 2019 04:27:42
Objet: Re: Issue with HTX

Hi Ionel,

On Tue, Oct 08, 2019 at 03:33:16PM +0200, GARDAIS Ionel wrote:
> Hi, 
> 
> I'm facing a data corruption that seems related to HTX. 
> Simply adding 'no option http-use-htx' solve the issue. 
> 
> What kind of traces do you need to help diagnose the issue and how to collect
> it ? 

Ideally a network capture before and after haproxy will help. Are you
sure you're running on an up-to-date version ? Christopher addressed
one such issue recently in H2. If you're able to reproduce this at
will, it may be useful to run on latest 2.1-dev, where we can help
you enable H1+H2 debug traces.

Thanks,
Willy
--
232 avenue Napoleon BONAPARTE 92500 RUEIL MALMAISON
Capital EUR 219 300,00 - RCS Nanterre B 408 832 301 - TVA FR 09 408 832 301




Re: healthchecks (to uwsgi) possible regression 1.9.8 -> 1.9.9

2019-10-09 Thread Willy Tarreau
Hi Jarno,

On Wed, Oct 09, 2019 at 02:20:00PM +0300, Jarno Huuskonen wrote:
> > There's no real state machine there, it's roughly "if request was not
> > sent, send it, otherwise try to receive a response; if response is
> > received, then process it and close otherwise wait for response". So
> > we cannot remain in a "waiting for closing" state after we receive a
> > response. I'm wondering if that couldn't be achieved using tcp-checks
> > however. I quickly tried but couldn't find a reliable way of doing this
> > but I'm thing that we could possibly extend the tcp-check rules to have
> > "tcp-check expect close" that would wait for a read0.
> 
> Should there be a somekind of timeout how long to wait for close ? I'm
> thinking about backend servers that have very long/infinite keepalive
> (and/or don't respect Connection: close).

Yes, we should have this once we rework the checks part. I've added this
in an issue already. The timeout should basically be the regular check
timeout, it would simply terminate the current check and the connection
before starting a new one. It would simply refrain from complaining!

Cheers,
Willy



Re: healthchecks (to uwsgi) possible regression 1.9.8 -> 1.9.9

2019-10-09 Thread Jarno Huuskonen
Hi,

Thanks Willy for looking into this !

On Tue, Oct 08, Willy Tarreau wrote:
> On Fri, Oct 04, 2019 at 07:28:15PM +0300, Jarno Huuskonen wrote:
> > I sent pcap/strace offlist.
> 
> Thanks, that was very useful.
> 
> > (strace -f -o -ttt, tcpdump -n -p -s 16384 -w ... host 127.0.0.1 and
> > port 8080).
> > 
> > I think in packet capture the second health checks causes
> > "client_addr: 127.0.0.1 client_port: 2779] hr_read(): Connection reset by 
> > peer [plugins/http/http.c line 917]"
> > (I think uswgi logs client_port incorrectly, ntohs(2779) gives 56074
> > (and port 56074 is in packet capture)).
> 
> It's as usual, when you start to troubleshoot a bug you find a myriad of
> other ones around :-)
> 
> > (haproxy version: HA-Proxy version 2.1-dev2 2019/10/01).
> > 
> > I tried to reproduce with very minimal flask/uwsgi hello world app
> > and there hr_read happens very rarely. 
> > With alerta(.io) app this happens more regularly (AFAIK not with every 
> > check).
> > So maybe this is weird timing issue or bug in uwsgi.
> 
> So it is indeed a timing issue. The check running on port 56062 shows the
> following sequence:
> 
> 17:10:57.877876 IP localhost.56062 > localhost.8080: Flags [S]
> 17:10:57.877898 IP localhost.8080 > localhost.56062: Flags [S.]
> 17:10:57.877909 IP localhost.56062 > localhost.8080: Flags [.]
> 17:10:57.878065 IP localhost.56062 > localhost.8080: Flags [P.]
> 17:10:57.878078 IP localhost.8080 > localhost.56062: Flags [.]
> 17:10:57.879933 IP localhost.8080 > localhost.56062: Flags [P.]
> 17:10:57.879939 IP localhost.56062 > localhost.8080: Flags [.]
> 17:10:57.880008 IP localhost.8080 > localhost.56062: Flags [F.]
> 17:10:57.880333 IP localhost.56062 > localhost.8080: Flags [F.]
> 17:10:57.880341 IP localhost.8080 > localhost.56062: Flags [.]
> 
> Note the FIN sent 75 microseconds after the response. This resulted in
> recvfrom() returning zero and the connection to be cleanly closed. Now
> regarding port 56074 that is causing excess logs :
> 
> 17:11:04.132867 IP localhost.56074 > localhost.8080: Flags [S]
> 17:11:04.132890 IP localhost.8080 > localhost.56074: Flags [S.]
> 17:11:04.132904 IP localhost.56074 > localhost.8080: Flags [.]
> 17:11:04.133083 IP localhost.56074 > localhost.8080: Flags [P.]
> 17:11:04.133098 IP localhost.8080 > localhost.56074: Flags [.]
> 17:11:04.135101 IP localhost.8080 > localhost.56074: Flags [P.]
> 17:11:04.135107 IP localhost.56074 > localhost.8080: Flags [.]
> 17:11:04.135316 IP localhost.56074 > localhost.8080: Flags [R.]
> 
> As you can see, even 215 microseconds after the response there is still
> no FIN. I've checked and in both cases the headers are the same. There
> is indeed a "connection: close" header emitted by haproxy, none is
> advertised in the response, but clearly it's a matter of delay. And
> this causes recvfrom() to return EAGAIN, so haproxy is forced to hard-
> close the connection (as if it were doing option http-server-close).
> Please also note that returned data are properly ACKed so there is not
> much more we can do there.
> 
> Before the patch that you bisected, you didn't face this because of a
> bug that was preventing the hard close from working, and instead you
> were accumulating TIME_WAIT sockets on the client side. So I'm afraid
> to say that for now the best I can say is that you'll still have to
> live with these occasional logs :-/

Thanks. Glad to see this wasn't a bug on haproxy side.

I tested the alerta app with gunicorn and gunicorn doesn't seem to mind
that haproxy uses forced close on the health check, so switching
uwsgi->gunicorn is another option.

> I'd really like to see a massive rework being done on the checks. Your
> report made me realize that there is one feature we're currently missing,
> which is the ability to wait for a close once we've got a response. At
> the moment we cannot add this because checks work in 2 steps :
> 
>  1) connect and send the "request"
>  2) receive the "reponse" and close.

And AFAIK this works just fine with most sane backend servers.

> There's no real state machine there, it's roughly "if request was not
> sent, send it, otherwise try to receive a response; if response is
> received, then process it and close otherwise wait for response". So
> we cannot remain in a "waiting for closing" state after we receive a
> response. I'm wondering if that couldn't be achieved using tcp-checks
> however. I quickly tried but couldn't find a reliable way of doing this
> but I'm thing that we could possibly extend the tcp-check rules to have
> "tcp-check expect close" that would wait for a read0.

Should there be a somekind of timeout how long to wait for close ? I'm
thinking about backend servers that have very long/infinite keepalive
(and/or don't respect Connection: close).

-Jarno

-- 
Jarno Huuskonen