Re: Connections stuck in CLOSE_WAIT state with h2

2018-07-27 Thread Milan Petruželka
On Fri, 27 Jul 2018 at 10:08, Willy Tarreau  wrote:

> Hi Olivier,
>
> On Fri, Jul 27, 2018 at 09:04:04AM +0200, Olivier Doucet wrote:
> > 24 hours later, still no issue to be reported. All sessions are expiring
> > just fine. I think you can merge :)
>
> Yes I think you're right, I'll do this, it will at least help all the users
> who don't want to patch their versions. We'll probably emit another 1.8
> soon.


Hi,

after 2 days I also have no blocked connections. There's no need to wait
until Monday as I suggested yesterday.

Milan


Re: Connections stuck in CLOSE_WAIT state with h2

2018-07-26 Thread Milan Petruželka
On Wed, 25 Jul 2018 at 18:39, Willy Tarreau  wrote:

> > This issue is very close to Milan bug, that's why I posted as a reply. If
> > I'm wrong, I'll split it in another thread.
>
> It's definitely the same as Milan's from my point of view. Many thanks
> for reporting it as well.
>
>
Hi Willy,

my haproxy with last two patches has no blocked connection since it started
24 hours ago. Before last patches it always had at least one. I think you
finally nailed it. To be on the safe side, I would wait till Monday, before
we start to celebrate.

Milan


Re: Connections stuck in CLOSE_WAIT state with h2

2018-07-25 Thread Milan Petruželka
Hi Willy,

On Tue, 24 Jul 2018 at 14:37, Willy Tarreau  wrote:

> So I'm having one update to emit the missing info on "show fd" (patch
> merged
> and pushed already, that I'm attaching here if it's easier for you)


I've compiled version 1.8.12-12a4b5-16 with from Git and let it run
overnight. Now I have 4 blocked connections. Maxid is always equal to
lastid:

 14 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x17c51f0
iocb=0x4d34d0(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x17a5240 st0=7 err=5 maxid=133 lastid=133
flg=0x1000 nbst=6 nbcs=0 fctl_cnt=0 send_cnt=6 tree_cnt=6 orph_cnt=6
dbuf=0/0 mbuf=0/16384
 19 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x178fab0
iocb=0x4d34d0(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x1798be0 st0=7 err=5 maxid=131 lastid=131
flg=0x1000 nbst=12 nbcs=0 fctl_cnt=0 send_cnt=12 tree_cnt=12
orph_cnt=12 dbuf=0/0 mbuf=0/16384
 20 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x17bac40
iocb=0x4d34d0(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x19e6220 st0=7 err=5 maxid=27 lastid=27
flg=0x1000 nbst=1 nbcs=0 fctl_cnt=0 send_cnt=1 tree_cnt=1 orph_cnt=1
dbuf=0/0 mbuf=0/16384
 22 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x18d2410
iocb=0x4d34d0(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x1a0dab0 st0=7 err=5 maxid=107 lastid=107
flg=0x1000 nbst=5 nbcs=0 fctl_cnt=0 send_cnt=5 tree_cnt=5 orph_cnt=5
dbuf=0/0 mbuf=0/16384



> And I *think* (and hope) that with these 2 patches on top of latest 1.8
>
we're OK now. What I would appreciate quite a lot if you're willing to
> let me abuse your time is to either git pull or apply
> 0001-MINOR-h2-add-the-error-code-and-the-max-last-stream-.patch on top
> of your up-to-date branch, then apply
> 0001-WIP-h2-try-to-address-possible-causes-for-the-close_.patch then
> apply h2-error.diff and test again.
>

Now I'll add both patches (WIP-h2 and h2-error.diff) and give it a try in
production.

I think we're about to nail it down, to be honest ;-)
>

That would be great :-)
Milan


Re: Connections stuck in CLOSE_WAIT state with h2

2018-07-24 Thread Milan Petruželka
Hi Willy,

Do you *think* that you got less CLOSE_WAITs or that the latest fixes
> didn't change anything ? I suspect that for some reason you might be
> hit by several bugs, which is what has complicated the diagnostic, but
> that's just pure guess.
>
>
I'm not sure. I left patched haproxy running over last weekend. I have a
slight feeling that there was less hanging connections than over last week,
but it could be because of lower weekend traffic. Now i'm running latest
GIT version (1.8.12-5e100b-15) and i'll compare the speed of blocked
connections increase against vannila 1.8.12 from last week.

Oh I'm just seeing you already did that in the next e-mail. Thank you :-)
>

I'll send more extended "show fd" dumps as soon as i catch some more.
Please let me know If you add more h2 state info into 1.8 git version and
i'll run it in production to get more info.

So we have this :
>
>  25 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
> owner=0x24f0a70
> iocb=0x4d34c0(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
>
> fe=fe-http mux=H2 mux_ctx=0x258a880 st0=7 flg=0x1000 nbst=8 nbcs=0
>
> fctl_cnt=0 send_cnt=8 tree_cnt=8 orph_cnt=8 dbuf=0/0 mbuf=0/16384
>
>
>   - st0=7 => H2_CS_ERROR2 : an error was sent, either it succeeded or
> could not be sent and had to be aborted nonetheless ;
>
>   - flg=1000 => H2_CF_GOAWAY_SENT : the GOAWAY frame was sent to the mux
> buffer.
>
>   - nbst=8 => 8 streams still attached
>
>   - nbcs=0 => 0 conn_streams found (application layer detached or not
> attached yet)
>
>   - send_cnt=8 => 8 streams still in the send_list, waiting for the mux
> to pick their contentx.
>
>   - tree_cnt=8 => 8 streams known in the tree (hence they are still valid
> from the H2 protocol perspective)
>
>   - orph_cnt=8 => 8 streams are orphaned : these streams have quit at the
> application layer (very likely a timeout).
>
>   - mbuf=0/16384 : the mux buffer is empty but allocated. It's not very
> common.
>
> At this point what it indicates is that :
>   - 8 streams were active on this connection and a response was sent (at
> least partially) and probably waited for the mux buffer to be empty
> due to data from other previous streams. I'm realising it would be
> nice to also report the highest stream index to get an idea of the
> number of past streams on the connection.
>
>   - an error happened (protocol error, network issue, etc, no more info
> at the moment) and caused haproxy to emit a GOAWAY frame. While doing
> so, the pending streams in the send_list were not destroyed.
>
>   - then for an unknown reason the situation doesn't move anymore. I'm
> realising that one case I figured in the past with an error possibly
> blocking the connection at least partially covers one point here, it
> causes the mux buffer to remain allocated, so this patch would have
> caused it to be released, but it's still incomplete.
>
> Now I have some elements to dig through, I'll try to mentally reproduce
> the complex sequence of a blocked response with a GOAWAY being sent at
> the same time to see what happens.
>
>
Thanks a lot for a detailed description.
Milan


Re: Connections stuck in CLOSE_WAIT state with h2

2018-07-23 Thread Milan Petruželka
Hi,

I've compiled latest haproxy 1.8.12 from Git repo (HAProxy version
1.8.12-5e100b-15, released 2018/07/20) with latest h2 patches and extended
h2 debug info. And after some time I caught one CLOSE_WAIT connection. Here
is extended show fd debug:
 25 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x24f0a70
iocb=0x4d34c0(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x258a880 st0=7 flg=0x1000 nbst=8 nbcs=0
fctl_cnt=0 send_cnt=8 tree_cnt=8 orph_cnt=8 dbuf=0/0 mbuf=0/16384

LSOF CLOSE_WAIT
haproxy 26364 haproxy   25u IPv47140390  0t0 TCP
ip:https->ip:50041 (CLOSE_WAIT)

Milan


Re: Connections stuck in CLOSE_WAIT state with h2

2018-07-23 Thread Milan Petruželka
On Fri, 20 Jul 2018 at 14:36, Milan Petruželka  wrote:

>
> I've applied both patches to vanilla haproxy 1.8.12. I'll leave it running
> and report back.
>

>

Hi,

After weekend CLOSE_WAIT connections are still there. What
does cflg=0x80203300 in "show fd" mean? FDs with cflg=0x80203300 are either
CLOSE_WAIT or "sock - protocol: TCP" - see FDs 14, 15, 16, 18, 19 and 25 in
following dumps. And - sockets in lsof state "sock - protocol: TCP" can't
be found in netstat.

SHOW FD 3300
 14 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x23d0340
iocb=0x4d4c90(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x2494cc0
 15 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x245c6f0
iocb=0x4d4c90(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x23c1db0
 16 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x25598e0
iocb=0x4d4c90(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x23d0900
 18 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x23940a0
iocb=0x4d4c90(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x242a030
 19 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x24a8b90
iocb=0x4d4c90(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x24820b0
 25 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x2457a10
iocb=0x4d4c90(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x2394660

LSOF
haproxy 31313 haproxy0u  CHR  136,1  0t0   4
/dev/pts/1
haproxy 31313 haproxy1w FIFO   0,10  0t0 2004495
pipe
haproxy 31313 haproxy2w FIFO   0,10  0t0 2004495
pipe
haproxy 31313 haproxy3u  a_inode   0,1107017
[eventpoll]
haproxy 31313 haproxy4u unix 0x88042aa3b400  0t0 2002869
/www/server/haproxy/cmd.sock.31313.tmp type=STREAM
haproxy 31313 haproxy5u IPv42002872  0t0 TCP
some.ip:http (LISTEN)
haproxy 31313 haproxy6u IPv42002873  0t0 TCP
some.ip:https (LISTEN)
haproxy 31313 haproxy7u IPv42002874  0t0 TCP
*:http-alt (LISTEN)
haproxy 31313 haproxy8u IPv42002875  0t0 TCP
*:8443 (LISTEN)
haproxy 31313 haproxy9r FIFO   0,10  0t0 2002876
pipe
haproxy 31313 haproxy   10w FIFO   0,10  0t0 2002876
pipe
haproxy 31313 haproxy   11u IPv46560416  0t0 TCP
some.ip:https->some.ip:49375 (ESTABLISHED)
haproxy 31313 haproxy   12u IPv42002883  0t0 UDP
*:52068
haproxy 31313 haproxy   13u IPv46656750  0t0 TCP
some.ip:https->some.ip:50544 (ESTABLISHED)
haproxy 31313 haproxy   14u IPv44951212  0t0 TCP
some.ip:https->some.ip:4 (CLOSE_WAIT)
haproxy 31313 haproxy   15u sock0,8  0t0 4111815
protocol: TCP
haproxy 31313 haproxy   16u sock0,8  0t0 6236118
protocol: TCP
haproxy 31313 haproxy   17u IPv46657419  0t0 TCP
some.ip:https->some.ip:64934 (ESTABLISHED)
haproxy 31313 haproxy   18u sock0,8  0t0 2653890
protocol: TCP
haproxy 31313 haproxy   19u IPv45699053  0t0 TCP
some.ip:https->some.ip:59601 (CLOSE_WAIT)
haproxy 31313 haproxy   20u IPv46656756  0t0 TCP
some.ip:https->some.ip:29233 (ESTABLISHED)
haproxy 31313 haproxy   21u IPv46656760  0t0 TCP
some.ip:https->some.ip:59058 (ESTABLISHED)
haproxy 31313 haproxy   22u IPv46654620  0t0 TCP
some.ip:https->some.ip:49306 (ESTABLISHED)
haproxy 31313 haproxy   23u IPv46656769  0t0 TCP
some.ip:https->some.ip:17513 (ESTABLISHED)
haproxy 31313 haproxy   25u IPv45873818  0t0 TCP
some.ip:https->some.ip:58413 (CLOSE_WAIT)
haproxy 31313 haproxy   26u unix 0x8802f924  0t0 6656772
type=STREAM
haproxy 31313 haproxy   27u IPv46656639  0t0 TCP
some.ip:https->some.ip:2926 (ESTABLISHED)

SHOW FD
  4 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [nlc] cache=0 owner=0x232ac80
iocb=0x4c0be0(listener_accept) tmask=0x
umask=0xfffe l.st=RDY fe=GLOBAL
  5 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [nlc] cache=0 owner=0x232ce80
iocb=0x4c0be0(listener_accept) tmask=0x
umask=0xfffe l.st=RDY fe=fe-http
  6 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [nlc] cache=0 owner=0x232d390
iocb=0x4c0be0(listener_accept) tmask=0x
umask=0xfffe l.st=RDY fe=fe-http
  7 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [nlc] cache=0 owner=0x234cb00
iocb=0x4c0be0(listener_accept) tmask=0x
uma

Re: Connections stuck in CLOSE_WAIT state with h2

2018-07-20 Thread Milan Petruželka
On Fri, 20 Jul 2018 at 13:37, Willy Tarreau  wrote:

> Thank you Janusz for testing. I've also merged the other patch that could
> lead to some CLOSE_WAIT that we tested a few months ago. It was different
> but maybe you were suffering from the two causes.
>

I've applied both patches to vanilla haproxy 1.8.12. I'll leave it running
and report back.

BUG/MEDIUM: h2: never leave pending data in the output buffer on close
BUG/MEDIUM: h2: make sure the last stream closes the connection after a
timeout

Milan


Re: Connections stuck in CLOSE_WAIT state with h2

2018-07-04 Thread Milan Petruželka
>
> 20180629.1347 mpeh2 fd25 h2c_error - st04 fl0002 err05
> Just hit h2c_error - H2_ERR_STREAM_CLOSED
>

After adding more debug I found following pattern around h2c_error in
hanging connections:

... everything OK until now
20180629.1826 e901:backend.srvrep[000e:001a]: HTTP/1.1 200 OK
20180629.1826 e901:backend.srvcls[000e:adfd]
20180629.1826 mpeh2 fd14 h2s_close/h2c  - id006d h2c_st04
h2c_fl streams:12
20180629.1826 mpeh2 fd14 h2s_close/real - id006d st04 fl3101
streams:12 -> 11

h2c_error
20180629.1826 mpeh2 fd14 h2_process_demux/11 - st04 fl
20180629.1826 mpeh2 fd14 h2c_error - st04 fl err05
20180629.1826 e8e7:backend.srvcls[000e:adfd]
20180629.1826 mpeh2 fd14 h2_process_mux/01a - st06 fl
20180629.1826 mpeh2 fd14 h2_process_mux/01b - st07 fl0100
20180629.1826 e8dd:backend.srvcls[000e:adfd]

few more streams closed before before log for file descriptor 14 ends and
connection hangs in CLOSE_WAIT
20180629.1827 x01.clicls[000e:adfd]
20180629.1827 e8dd:backend.closed[000e:adfd]
20180629.1827 e8df:backend.clicls[000e:adfd]
20180629.1827 e8df:backend.closed[000e:adfd]
20180629.1827 mpeh2 fd14 h2s_destroy - id0045 st06 fl3081 streams:11
20180629.1827 mpeh2 fd14 h2s_close/h2c  - id0045 h2c_st07
h2c_fl0100 streams:11
20180629.1827 mpeh2 fd14 h2s_close/real - id0045 st06 fl3081
streams:11 -> 10

I have not seen this pattern (h2_process_demux/11 followed by h2c_error and
h2_process_mux/01a + h2_process_mux/01b)
in other conncetions. Only in those in CLOSE_WAIT state.

Here is the piece of code with added debug h2_process_demux/11 from sources
of haproxy 1.8.12

/* RFC7540#5.1:closed: if this state is reached as a
* result of sending a RST_STREAM frame, the peer that
* receives the RST_STREAM might have already sent
* frames on the stream that cannot be withdrawn. An
* endpoint MUST ignore frames that it receives on
* closed streams after it has sent a RST_STREAM
* frame. An endpoint MAY choose to limit the period
* over which it ignores frames and treat frames that
* arrive after this time as being in error.
*/
if (!(h2s->flags & H2_SF_RST_SENT)) {
/* RFC7540#5.1:closed: any frame other than
* PRIO/WU/RST in this state MUST be treated as
* a connection error
*/
if (h2c->dft != H2_FT_RST_STREAM &&
h2c->dft != H2_FT_PRIORITY &&
h2c->dft != H2_FT_WINDOW_UPDATE) {
send_log(NULL, LOG_NOTICE, "mpeh2 fd%d h2_process_demux/11 - st%02x
fl%08x\n", mpeh2_h2c_fd(h2c), mpeh2_h2c_st0(h2c), mpeh2_h2c_flags(h2c));
h2c_error(h2c, H2_ERR_STREAM_CLOSED);
goto strm_err;
}
}

Here is the piece of code with added debug h2_process_mux/01a and 01b from
sources of haproxy 1.8.12

fail:
if (unlikely(h2c->st0 >= H2_CS_ERROR)) {
send_log(NULL, LOG_NOTICE, "mpeh2 fd%d h2_process_mux/01a - st%02x
fl%08x\n", mpeh2_h2c_fd(h2c), mpeh2_h2c_st0(h2c), mpeh2_h2c_flags(h2c));
if (h2c->st0 == H2_CS_ERROR) {
if (h2c->max_id >= 0) {
h2c_send_goaway_error(h2c, NULL);
if (h2c->flags & H2_CF_MUX_BLOCK_ANY)
return 0;
}

h2c->st0 = H2_CS_ERROR2; // sent (or failed hard) !
}
send_log(NULL, LOG_NOTICE, "mpeh2 fd%d h2_process_mux/01b - st%02x
fl%08x\n", mpeh2_h2c_fd(h2c), mpeh2_h2c_st0(h2c), mpeh2_h2c_flags(h2c));
return 1;
}

I hope this helps in isolating the problem. If it's not enough, I can add
more debug to h2_mux if someone with better knowledge of source code and h2
protocol suggests where.

Milan


Re: Connections stuck in CLOSE_WAIT state with h2

2018-06-29 Thread Milan Petruželka
On Fri, 29 Jun 2018 at 11:19, Milan Petruželka  wrote:

> I've added more debug into h2s_close to see not only h2s state and flags
> but also h2c state and flags. My only way to reproduce the bug is to let
> Haproxy run until some of its FD falls into CLOSE_WAIT. After I catch some,
> I'll report back.
>

I just caught new CLOSE_WAIT.

... continuation of longer h2 connection ...

20180629.1347 mpeh2 fd25 h2c_stream_new - id0117 st00 fl
streams:0 -> 1
20180629.1347 000267eb:frntend.accept(0006)=0019 from [some_ip:52750]
ALPN=h2
20180629.1347 mpeh2 fd25 h2c_stream_new - id0119 st00 fl
streams:1 -> 2
20180629.1347 000267ec:frntend.accept(0006)=0019 from [some_ip:52750]
ALPN=h2
20180629.1347 mpeh2 fd25 h2c_stream_new - id011b st00 fl
streams:2 -> 3
20180629.1347 000267ed:frntend.accept(0006)=0019 from [some_ip:52750]
ALPN=h2
20180629.1347 mpeh2 fd25 h2c_stream_new - id011d st00 fl
streams:3 -> 4
20180629.1347 000267ee:frntend.accept(0006)=0019 from [some_ip:52750]
ALPN=h2
20180629.1347 mpeh2 fd25 h2c_stream_new - id011f st00 fl
streams:4 -> 5
20180629.1347 000267ef:frntend.accept(0006)=0019 from [some_ip:52750]
ALPN=h2
20180629.1347 mpeh2 fd25 h2c_stream_new - id0121 st00 fl
streams:5 -> 6
20180629.1347 000267f0:frntend.accept(0006)=0019 from [some_ip:52750]
ALPN=h2
20180629.1347 mpeh2 fd25 h2c_stream_new - id0123 st00 fl
streams:6 -> 7
20180629.1347 000267f1:frntend.accept(0006)=0019 from [some_ip:52750]
ALPN=h2
20180629.1347 mpeh2 fd25 h2c_stream_new - id0125 st00 fl
streams:7 -> 8
20180629.1347 000267f2:frntend.accept(0006)=0019 from [some_ip:52750]
ALPN=h2
20180629.1347 mpeh2 fd25 h2c_stream_new - id0127 st00 fl
streams:8 -> 9
20180629.1347 000267f3:frntend.accept(0006)=0019 from [some_ip:52750]
ALPN=h2
OK, we have some new h2 streams

20180629.1347 000267eb:frntend.clireq[0019:]: GET /some/uri HTTP/1.1
20180629.1347 000267ec:frntend.clireq[0019:]: GET /some/uri HTTP/1.1
20180629.1347 000267ed:frntend.clireq[0019:]: GET /some/uri HTTP/1.1
20180629.1347 000267ee:frntend.clireq[0019:]: GET /some/uri HTTP/1.1
20180629.1347 000267ef:frntend.clireq[0019:]: GET /some/uri HTTP/1.1
20180629.1347 000267f0:frntend.clireq[0019:]: GET /some/uri HTTP/1.1
20180629.1347 000267f1:frntend.clireq[0019:]: GET /some/uri HTTP/1.1
20180629.1347 000267f2:frntend.clireq[0019:]: GET /some/uri HTTP/1.1
20180629.1347 000267f3:frntend.clireq[0019:]: GET /some/uri HTTP/1.1
20180629.1347 000267eb:backend.srvrep[0019:0010]: HTTP/1.1 200 OK
20180629.1347 000267ef:backend.srvrep[0019:001b]: HTTP/1.1 200 OK
20180629.1347 000267f1:backend.srvrep[0019:001d]: HTTP/1.1 200 OK
20180629.1347 000267f3:backend.srvrep[0019:001f]: HTTP/1.1 200 OK
20180629.1347 000267f2:backend.srvrep[0019:adfd]: HTTP/1.1 200 OK
20180629.1347 000267f2:backend.srvcls[0019:adfd]
20180629.1347 mpeh2 fd25 h2s_close/h2c  - id0125 h2c_st02
h2c_fl streams:9
20180629.1347 mpeh2 fd25 h2s_close/real - id0125 st04 fl3001
streams:9 -> 8
20180629.1347 000267ee:backend.srvrep[0019:adfd]: HTTP/1.1 200 OK
20180629.1347 000267ee:backend.srvcls[0019:adfd]
20180629.1347 000267f0:backend.srvrep[0019:adfd]: HTTP/1.1 200 OK
20180629.1347 000267f0:backend.srvcls[0019:adfd]
20180629.1347 000267f4:frntend.clicls[0019:]
20180629.1347 000267f4:frntend.closed[0019:]
20180629.1347 mpeh2 fd25 h2s_destroy - id0125 st07 fl3003 streams:8
20180629.1347 mpeh2 fd25 h2s_close/h2c  - id0125 h2c_st02
h2c_fl streams:8
20180629.1347 mpeh2 fd25 h2s_close/dumy - id0125 st07 fl3003
streams:8 -> 8
20180629.1347 mpeh2 fd25 h2s_close/h2c  - id011d h2c_st02
h2c_fl streams:8
20180629.1347 mpeh2 fd25 h2s_close/real - id011d st04 fl3001
streams:8 -> 7
20180629.1347 mpeh2 fd25 h2s_close/h2c  - id0121 h2c_st02
h2c_fl streams:7
20180629.1347 mpeh2 fd25 h2s_close/real - id0121 st04 fl3001
streams:7 -> 6
20180629.1347 000267ed:backend.srvrep[0019:0017]: HTTP/1.1 200 OK
20180629.1347 000267ec:backend.srvrep[0019:adfd]: HTTP/1.1 200 OK
20180629.1347 000267ec:backend.srvcls[0019:adfd]
20180629.1347 000267f5:frntend.clicls[0019:]
20180629.1347 000267f5:frntend.closed[0019:]
20180629.1347 mpeh2 fd25 h2s_destroy - id011d st07 fl3003 streams:6
20180629.1347 mpeh2 fd25 h2s_close/h2c  - id011d h2c_st02
h2c_fl0002 streams:6
20180629.1347 mpeh2 fd25 h2s_close/dumy - id011d st07 fl3003
streams:6 -> 6
20180629.1347 000267f6:frntend.clicls[0019:]
20180629.1347 000267f6:frntend.closed[0019:]
20180629.1347 mpeh2 fd25 h2s_destroy - id0121 st07 fl3003 streams:6
20180629.1347 mpeh2 fd25 h2s_close/h2c  - id0121 h2c_st02
h2c_fl0002 streams:6
20180629.1347 mpeh2 fd25 h2s_close/dumy - id0121 st07 fl3003
streams:6 -> 6
20180629

Re: Connections stuck in CLOSE_WAIT state with h2

2018-06-29 Thread Milan Petruželka
Hi Willy,

I'm back at work after 2 weeks on the beach in Dalmatia. I've patched my
Haproxy 1.8.11 with all three patches discussed here in last two weeks. It
didn't help. Then tried to run Haproxy with debug enabled. The last logs
from FD hanging in CLOSE_WAIT looks like this:

00020535:frntend.accept(0006)=000b from [ip:58321] ALPN=h2
00020535:frntend.clireq[000b:]: POST /some/uri HTTP/1.1
00020535:backend.srvrep[000b:0015]: HTTP/1.1 200 OK
00020535:backend.srvcls[000b:adfd]
00020536:frntend.clicls[000b:]
00020536:frntend.closed[000b:]
00020537:frntend.clicls[000b:]
00020537:frntend.closed[000b:]
00020538:frntend.clicls[000b:]
00020538:frntend.closed[000b:]
00020516:backend.srvcls[000b:adfd]
0002051a:backend.srvcls[000b:adfd]
00020514:backend.srvcls[000b:adfd]
00020514:backend.clicls[000b:adfd]
00020514:backend.closed[000b:adfd]
00020516:backend.clicls[000b:adfd]
00020516:backend.closed[000b:adfd]
0002051a:backend.clicls[000b:adfd]
0002051a:backend.closed[000b:adfd]
0002051d:backend.srvcls[000b:adfd]
0002051d:backend.clicls[000b:adfd]
0002051d:backend.closed[000b:adfd]
0002051e:backend.clicls[000b:adfd]
0002051e:backend.closed[000b:adfd]
0002051f:backend.srvcls[000b:adfd]
0002051f:backend.clicls[000b:adfd]
0002051f:backend.closed[000b:adfd]
00020528:backend.srvcls[000b:adfd]
00020528:backend.clicls[000b:adfd]
00020528:backend.closed[000b:adfd]
00020529:backend.srvcls[000b:adfd]
00020529:backend.clicls[000b:adfd]
00020529:backend.closed[000b:adfd]
0002052a:backend.srvcls[000b:adfd]
0002052a:backend.clicls[000b:adfd]
0002052a:backend.closed[000b:adfd]
0002052b:backend.srvcls[000b:adfd]
0002052b:backend.clicls[000b:adfd]
0002052b:backend.closed[000b:adfd]
00020535:backend.clicls[000b:adfd]
00020535:backend.closed[000b:adfd]

I decided to add some quick'n'dirty debug messages into mux_h2.c to see
more details. I modified send_log function to write to console and used it
for digging inside h2 mux. I'm not attaching the full patch, because it's
ugly and nothing to be proud of. Just an example of patched function to get
the feeling.

static inline void h2s_close(struct h2s *h2s)
{
send_log(NULL, LOG_NOTICE, "mpeh2 fd%d h2s_close - id%08x st%02x fl%08x
streams:%d\n", mpeh2_h2s_fd(h2s), mpeh2_h2s_id(h2s), mpeh2_h2s_st(h2s),
mpeh2_h2s_flags(h2s), h2s->h2c->nb_streams);

if (h2s->st != H2_SS_CLOSED)
h2s->h2c->nb_streams--;
h2s->st = H2_SS_CLOSED;
}

This is how successful connection looks like:

20180628.1614 mpeh2 fd17 h2c_frt_init
20180628.1614 mpeh2 fd17 h2c_stream_new - id0001 st00 fl
20180628.1614 004c:frntend.accept(0006)=0011 from [some_ip:5059] ALPN=h2
20180628.1614 004c:frntend.clireq[0011:]: GET /some/uri HTTP/1.1
20180628.1614 004c:backend.srvrep[0011:0013]: HTTP/1.1 200 OK
20180628.1614 004c:backend.srvcls[0011:adfd]
20180628.1614 mpeh2 fd17 h2s_close - id0001 st04 fl3001 streams:1
20180628.1614 004f:frntend.clicls[0011:]
20180628.1614 004f:frntend.closed[0011:]
20180628.1614 mpeh2 fd17 h2s_destroy - id0001 st07 fl3003
20180628.1614 mpeh2 fd17 h2s_close - id0001 st07 fl3003 streams:0
20180628.1614 mpeh2 fd17 h2c_stream_new - id0003 st00 fl
20180628.1614 0056:frntend.accept(0006)=0011 from [some_ip:5059] ALPN=h2
20180628.1614 0056:frntend.clireq[0011:]: GET /some/uri HTTP/1.1
20180628.1614 0056:backend.srvrep[0011:0013]: HTTP/1.1 200 OK
20180628.1614 0056:backend.srvcls[0011:adfd]
20180628.1614 mpeh2 fd17 h2s_close - id0003 st04 fl3001 streams:1
20180628.1614 0057:frntend.clicls[0011:]
20180628.1614 0057:frntend.closed[0011:]
20180628.1614 mpeh2 fd17 h2s_destroy - id0003 st07 fl3003
20180628.1614 mpeh2 fd17 h2s_close - id0003 st07 fl3003 streams:0
20180628.1614 mpeh2 fd17 h2c_stream_new - id0005 st00 fl
20180628.1614 0059:frntend.accept(0006)=0011 from [some_ip:5059] ALPN=h2
20180628.1614 0059:frntend.clireq[0011:]: POST /some/uri
HTTP/1.1
20180628.1614 0059:backend.srvrep[0011:0013]: HTTP/1.1 200 OK
20180628.1614 0059:backend.srvcls[0011:adfd]
20180628.1614 mpeh2 fd17 h2s_close - id0005 st04 fl3101 streams:1
20180628.1614 005b:frntend.clicls[0011:]
20180628.1614 005b:frntend.closed[0011:]
20180628.1614 mpeh2 fd17 h2s_destroy - id0005 st07 fl3103
20180628.1614 mpeh2 fd17 h2s_close - id0005 st07 fl3103 streams:0
20180628.1614 mpeh2 fd17 h2c_stream_new - id0007 st00 fl
20180628.1614 005d:frntend.accept(0006)=0011 from [some_ip:5059] ALPN=h2
20180628.1614 005d:frntend.clireq[0011:]: GET /some/uri HTTP/1.1
20180628.1614 005d:backend.srvrep[0011:0013]: HTTP/1.1 200 OK
20180628.1614 005d:backend.srvcls[0011:adfd]
20180628.1614 mpeh2 fd17 h2s_close - id0007 st04 fl3001 streams:1
20180628.1614 005e:frntend.clicls[0011:]
20180628.1614 

Re: Connections stuck in CLOSE_WAIT state with h2

2018-06-08 Thread Milan Petruželka
On Wed, 6 Jun 2018 at 11:20, Willy Tarreau  wrote:

> Hi Milan,
>
> On Wed, Jun 06, 2018 at 11:09:19AM +0200, Milan Petruzelka wrote:
> > Hi Willy,
> >
> > I've tracked one of connections hanging in CLOSE_WAIT state with tcpdump
> > over last night. It started at 17:19 like this:
> >
> > "Packet No.","Time in
> seconds","Source","Destination","Protocol","Length","Info"
> > "1","0.00","ip_client","ip_haproxy_server","TCP","62","64311
> > 443 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1"
> > "2","0.001049","ip_haproxy_server","ip_client","TCP","62","443
> > 64311 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM=1"
>
> Sure, it's extremely useful. It looks like a shutdown is ignored when
> waiting for the H2 preface. The FD is not being polled for reading, so
> I tend to conclude that either it was disabled for a reason I still do
> not know, or it used to be enabled, the shutdown was received then the
> polling was disabled. But that doesn't appear in the connection flags.
> So it seems that the transition between the end of handshake and the
> start of parsing could be at fault. Maybe we refrain from entering the
> H2 state machine because of the early shutdown. I'm going to have a look
> in that direction.


Hi Willy,

Here is some additional info about the hanging CLOSE_WAIT connection
described in my last message - screenshot from our log management -
https://pasteboard.co/HoVCR0b.jpg and text variant:

No. Time hap.ts.accept  hap.client.porthap.term.state
  hap.rq.method
  hap.rsp.statushap.rsp.size
1 June 5th 2018, 17:19:22.913 June 5th 2018, 17:19:22.435 64311  GET 200
17,733
2 June 5th 2018, 17:19:23.814 June 5th 2018, 17:19:23.806 64311  GET 200
4,671
3 June 5th 2018, 17:19:23.814 June 5th 2018, 17:19:23.806 64311  GET 200
4,481
4 June 5th 2018, 17:19:23.820 June 5th 2018, 17:19:23.808 64311  GET 200
4,613
5 June 5th 2018, 17:19:23.821 June 5th 2018, 17:19:23.808 64311  GET 200
4,584
6 June 5th 2018, 17:19:24.967 June 5th 2018, 17:19:24.961 64311  GET 200
2,786
7 June 5th 2018, 17:19:28.945 June 5th 2018, 17:19:25.049 64311  GET 200
78,325
8 June 5th 2018, 17:19:28.945 June 5th 2018, 17:19:25.049 64311  GET 200
46,830
9 June 5th 2018, 17:19:31.617 June 5th 2018, 17:19:24.707 64311 CL-- GET 200
13,086
10 June 5th 2018, 17:19:31.618 June 5th 2018, 17:19:25.049 64311 CD-- GET
200 49,162
11 June 5th 2018, 17:19:31.618 June 5th 2018, 17:19:25.049 64311 CD-- GET
200 40,328
12 June 5th 2018, 17:19:31.618 June 5th 2018, 17:19:25.066 64311 CL-- POST
200 3,198

As you can see, client first made 8 successful requests using http/2
connection through Haproxy to our backend server. Remaining 4 requests
finished with termination states CL-- or CD--, probably because of FIN
packet received form client.

C : the TCP session was unexpectedly aborted by the client
D : the session was in the DATA phase.
L : the proxy was still transmitting LAST data to the client while the
server had already finished. This one is very rare as it can only happen
when the client dies while receiving the last packets.

It looks like Haproxy knows about client disconnection in muxed requests
inside frontend http/2 connection. It just seems not to propagate this
knowledge to the frontend connection, leaving it hanging in CLOSE_WAIT
state. Sorry for not sending this piece of log last time, i didn't expect
to see any of those CL-- / CD-- requests.

I'll switch http/2 off on the site because I'll be AFK for 2 weeks. After
that I'll be ready to provide any additional info to hunt this one down. Or
to test the patched version if available.

Thank you, best regards
Milan


Re: Connections stuck in CLOSE_WAIT state with h2

2018-06-06 Thread Milan Petruželka
Hi Willy,

I've tracked one of connections hanging in CLOSE_WAIT state with tcpdump
over last night. It started at 17:19 like this:

"Packet No.","Time in
seconds","Source","Destination","Protocol","Length","Info"
"1","0.00","ip_client","ip_haproxy_server","TCP","62","64311  >
443 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1"
"2","0.001049","ip_haproxy_server","ip_client","TCP","62","443  >
64311 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM=1"
"3","0.127239","ip_client","ip_haproxy_server","TCP","54","64311  >
443 [ACK] Seq=1 Ack=1 Win=64240 Len=0"
"4","0.127344","ip_client","ip_haproxy_server","TLSv1.2","571","Client
Hello"
"5","0.130304","ip_haproxy_server","ip_client","TLSv1.2","2974","Server
Hello, Certificate"
"6","0.130336","ip_haproxy_server","ip_client","TLSv1.2","310","Server
Key Exchange, Server Hello Done"

After some 13 seconds client sent it's last data, which haproxy server
acknowledged.


"319","13.781347","ip_client","ip_haproxy_server","TLSv1.2","96","Application
Data"
"320","13.781365","ip_haproxy_server","ip_client","TCP","54","443  >
64311 [ACK] Seq=240156 Ack=3689 Win=36448 Len=0"

Then client sent FIN packet, server acknowledged it again

"321","16.292016","ip_client","ip_haproxy_server","TCP","54","64311  >
443 [FIN, ACK] Seq=3689 Ack=240156 Win=64240 Len=0"
"322","16.329574","ip_haproxy_server","ip_client","TCP","54","443  >
64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0"

>From then client sent only TCP keepalive every 45s, which server always
ackonwledged.

"323","61.443121","ip_client","ip_haproxy_server","TCP","55","[TCP
Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
"324","61.443216","ip_haproxy_server","ip_client","TCP","66","[TCP
Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
SLE=3689 SRE=3690"
"325","106.528926","ip_client","ip_haproxy_server","TCP","55","[TCP
Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
"326","106.529117","ip_haproxy_server","ip_client","TCP","66","[TCP
Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
SLE=3689 SRE=3690"
...

After some 4.5 hours (at 21:51) client sent last keepalive which server
acknowledged. There were no more packets after that.

"1043","16284.644240","ip_client","ip_haproxy_server","TCP","55","[TCP
Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
"1044","16284.644354","ip_haproxy_server","ip_client","TCP","66","[TCP
Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
SLE=3689 SRE=3690"
"1045","16329.797223","ip_client","ip_haproxy_server","TCP","55","[TCP
Keep-Alive] 64311  >  443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
"1046","16329.797274","ip_haproxy_server","ip_client","TCP","66","[TCP
Keep-Alive ACK] 443  >  64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
SLE=3689 SRE=3690"

Next day in the morning at 10:40 I can still see the hanging connection on
the server:

netstat -aptn|grep 64311
tcp  430  0 ip_haproxy_server:443  ip_client:64311
 CLOSE_WAIT  916/haproxy

lsof|grep 64311
haproxy 916  haproxy   40u IPv4  106204553
  0t0TCP ip_haproxy_server:https->ip_client:64311 (CLOSE_WAIT)

echo "show fd" | socat - $HASOCK | grep "40 :"
40 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x1648d80
iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x15e9460

I hope this can help in tracking the problem down.

Best regards,
Milan


Re: Connections stuck in CLOSE_WAIT state with h2

2018-06-01 Thread Milan Petruželka
Hi Willy

On Thu, 31 May 2018 at 21:07, Willy Tarreau  wrote:

> Thank you. It's pretty constant, so it doesn't really seem load-dependant
> (unless your load is reasonably stable of course).


Here is a 3-day sample from the "grow" period -
https://pasteboard.co/HnQu44t.png. Overall trend is steady growth with
somewhat varying speed. There are also short periods when the number of
frontend connections decreases. These periods seem to match times when
request rate drops below 5000/hour.


> I'm seeing that the number of entries in "show fd" is not that much, did
> you truncate it or
> is it all you had ?I'm asking because suddenly that makes me think that
> it could be possible that we're leaking sockets that are not correctly
> closed (for whatever reason). That would explain why they're not visible
> in "show fd" but remain in CLOSE_WAIT. It's absolutely not what I was
> expecting last time and I could have overlooked this possibility.
>

"show fd" list was not truncated. It's really shorter than the number of
CLOSE_WAIT netstat entries. Extra netstat "CLOSE_WAIT" connections
invisible in "show fg" may come from socket leak. See below for current
sample - 111 CLOSE_WAIT vs. 64 "show fd".

Why are there so many "show fd" sockets with cflg=0x80203300? They are the
one which keep growing. Diff between my first "show fd" from yesterday and
new from today clearly shows that a lot of them did not change and new
appeared.

cat show-fd-2018-06-01.txt | cut -c128-142 | grep cflg | sort | uniq -c |
sort -n
  1 cflg=0x00202306
  1 cflg=0x00241300
 10 cflg=0x80203306
 45 cflg=0x80203300 !

wc -l *
   42 show-fd-2018-05-31.txt
   64 show-fd-2018-06-01.txt

diff -u *
--- show-fd-2018-05-31.txt 2018-06-01 10:58:21.354207477 +0200
+++ show-fd-2018-06-01.txt 2018-06-01 10:58:50.886461366 +0200
@@ -12,7 +12,7 @@
  17 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x2012970 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x22a7480
  18 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x21b7e30 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x210b610
  19 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x1efa590 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x1f02610
- 20 : st=0x25(R:PrA W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x21b83a0 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203306 fe=fe-http mux=H2 mux_ctx=0x22a2800
+ 20 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x1fe8a90 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x2311bd0
  21 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x20a0f10 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x216ecf0
  22 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x1f02a40 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x20d03b0
  23 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x21b8e80 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x22d2ff0
@@ -25,18 +25,40 @@
  30 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x217cac0 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x22cd210
  31 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x2163540 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x209b2e0
  32 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x209b750 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x20a02b0
- 33 : st=0x25(R:PrA W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x20ce280 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203306 fe=fe-http mux=H2 mux_ctx=0x2321d60
+ 33 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x20c07c0 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x2328430
  34 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x2162d40 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x1fbe580
+ 35 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x1f914d0 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x1ed2da0
  36 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x1f05a70 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x20c0d80
  37 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
owner=0x20e43a0 iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0
cflg=0x80203300 fe=fe-http mux=H2 mux_ctx=0x1f18200
  38 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0

Re: Connections stuck in CLOSE_WAIT state with h2

2018-05-31 Thread Milan Petruželka
I've got the same problem described by Janusz Dziemidowicz. After enabling
h2 on Haproxy 1.8.8 it slowly accumulates frontend connections. Reload or
restart cuts them down, but they start to grow again. Upgrade to Haproxy
1.8.9 did not help. See a 40-day graph here -
https://pasteboard.co/HnHlKdg.png

Milan Petruzelka


./bin/haproxy -vv

HA-Proxy version 1.8.9-83616ec 2018/05/18
Copyright 2000-2018 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = native
  CC  = gcc
  CFLAGS  = -O2 -march=native -g -fno-strict-aliasing
-Wdeclaration-after-statement -fwrapv -fno-strict-overflow -Wno-unused-label
  OPTIONS = USE_LIBCRYPT=1 USE_ZLIB=1 USE_VSYSCALL=1 USE_OPENSSL=1
USE_SYSTEMD=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
Running on OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.38 2015-11-23
Running on PCRE version : 8.38 2015-11-23
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace



netstat -aptn|grep haproxy|cut -c 69-80|sort|uniq -c|sort -n

  4 LISTEN
 20 ESTABLISHED
102 CLOSE_WAIT



netstat -apton|grep CLOS|grep hapro|head

tcp 7292  0 MY_IP:443  CLIENT_IP:53575 CLOSE_WAIT
16694/haproxyoff (0.00/0/0)
tcp  211  0 MY_IP:443  CLIENT_IP:50930 CLOSE_WAIT
16694/haproxyoff (0.00/0/0)
tcp  723  0 MY_IP:443  CLIENT_IP:4873  CLOSE_WAIT
16694/haproxyoff (0.00/0/0)
tcp  242  0 MY_IP:443  CLIENT_IP:50831 CLOSE_WAIT
5205/haproxy off (0.00/0/0)
tcp1  0 MY_IP:443  CLIENT_IP:18309 CLOSE_WAIT
5205/haproxy off (0.00/0/0)
tcp  200  0 MY_IP:443  CLIENT_IP:64678  CLOSE_WAIT
16694/haproxyoff (0.00/0/0)
tcp  242  0 MY_IP:443  CLIENT_IP:50085  CLOSE_WAIT
16694/haproxyoff (0.00/0/0)
tcp  242  0 MY_IP:443  CLIENT_IP:64613  CLOSE_WAIT
16694/haproxyoff (0.00/0/0)
tcp  347  0 MY_IP:443  CLIENT_IP:60401 CLOSE_WAIT
5205/haproxy off (0.00/0/0)
tcp  753  0 MY_IP:443  CLIENT_IP:64146CLOSE_WAIT
16694/haproxyoff (0.00/0/0)


> show fd
  4 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [nlC] cache=0 owner=0x1eb4bf0
iocb=0x4be950(listener_accept) tmask=0x
umask=0xfffe l.st=RDY fe=GLOBAL
  5 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [nlC] cache=0 owner=0x1eb6df0
iocb=0x4be950(listener_accept) tmask=0x
umask=0xfffe l.st=RDY fe=fe-http
  6 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [nlC] cache=0 owner=0x1eb7300
iocb=0x4be950(listener_accept) tmask=0x
umask=0xfffe l.st=RDY fe=fe-http
  8 : st=0x05(R:PrA W:pra) ev=0x01(heopI) [nlC] cache=0 owner=0x1ed1b10
iocb=0x4be950(listener_accept) tmask=0x
umask=0xfffe l.st=RDY fe=fe-service
  9 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [nlC] cache=0 owner=0x1ed2020
iocb=0x4be950(listener_accept) tmask=0x
umask=0xfffe l.st=RDY fe=fe-service
 10 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [nlc] cache=0 owner=0x754758
iocb=0x48cdf0(unknown) tmask=0x umask=0xfffe
 11 : st=0x05(R:PrA W:pra) ev=0x00(heopi) [nlc] cache=0 owner=0x4e7200
iocb=0x4e7200(thread_sync_io_handler) tmask=0x
umask=0xfffe
 13 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x1f05c30
iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x1f055a0
 14 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x20ea060
iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x22a1580