Re: [2.2.11] 100% CPU again

2021-04-29 Thread Christopher Faulet

Le 22/04/2021 à 09:00, Maciej Zdeb a écrit :
śr., 21 kwi 2021 o 13:53 Christopher Faulet > napisał(a):


The fix was merge in upstream :

* BUG/MAJOR: mux-h2: Properly detect too large frames when decoding headers
    (http://git.haproxy.org/?p=haproxy.git;a=commit;h=07f88d75
)


Thanks! Building 2.2.13 and 2.3.9 with patches. I'll return with feedback if 
anything pops out again.


FYI, the 2.3.10 and 2.2.14 were released, with the fix.

--
Christopher Faulet



Re: [2.2.11] 100% CPU again

2021-04-22 Thread Maciej Zdeb
śr., 21 kwi 2021 o 13:53 Christopher Faulet 
napisał(a):

> The fix was merge in upstream :
>
> * BUG/MAJOR: mux-h2: Properly detect too large frames when decoding headers
>(http://git.haproxy.org/?p=haproxy.git;a=commit;h=07f88d75)
>

Thanks! Building 2.2.13 and 2.3.9 with patches. I'll return with feedback
if anything pops out again.

Kind regards,


Re: [2.2.11] 100% CPU again

2021-04-21 Thread Robin H. Johnson
On Wed, Apr 21, 2021 at 01:53:32PM +0200, Christopher Faulet wrote:
> Le 21/04/2021 à 08:48, Maciej Zdeb a écrit :
> > I'm very happy you managed to reproduce a similar issue! :)
> The fix was merge in upstream :
> 
> * BUG/MAJOR: mux-h2: Properly detect too large frames when decoding headers
>(http://git.haproxy.org/?p=haproxy.git;a=commit;h=07f88d75)
> 
> It has not been backported yet. But it may be manually applied on the 2.3 and 
> 2.2 if required. If so, optionally, this other one may be applied too :
> 
> * BUG/MEDIUM: mux-h2: Fix dfl calculation when merging CONTINUATION frames
>(http://git.haproxy.org/?p=haproxy.git;a=commit;h=cb1847c7)
Thanks for these patches and the other nghttp patch you sent.

I'll try get a local build out in our systems (2.2), but I'm worried
based on our backtraces that the patches aren't going to be fixed by
this, because HTTP/2 isn't used in our systems yet (it's on the
roadmap).

-- 
Robin Hugh Johnson
E-Mail : robb...@orbis-terrarum.net
Home Page  : http://www.orbis-terrarum.net/?l=people.robbat2
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85


signature.asc
Description: PGP signature


Re: [2.2.11] 100% CPU again

2021-04-21 Thread Christopher Faulet

Le 21/04/2021 à 08:48, Maciej Zdeb a écrit :

I'm very happy you managed to reproduce a similar issue! :)


The fix was merge in upstream :

* BUG/MAJOR: mux-h2: Properly detect too large frames when decoding headers
  (http://git.haproxy.org/?p=haproxy.git;a=commit;h=07f88d75)

It has not been backported yet. But it may be manually applied on the 2.3 and 
2.2 if required. If so, optionally, this other one may be applied too :


* BUG/MEDIUM: mux-h2: Fix dfl calculation when merging CONTINUATION frames
  (http://git.haproxy.org/?p=haproxy.git;a=commit;h=cb1847c7)

--
Christopher Faulet



Re: [2.2.11] 100% CPU again

2021-04-21 Thread Christopher Faulet

Le 21/04/2021 à 08:48, Maciej Zdeb a écrit :

I'm very happy you managed to reproduce a similar issue! :)

Does it affect 2.3.9? I've experienced 100% cpu on it also. But on 2.3 ALL 
threads loops on _do_poll/epoll_wait and threads did not hang on a particular h2 
session. :( I'll check twice and return in the new thread for 2.3.9, because as 
for now it looks like a different bug.


The bug affects all HAProxy versions supporting CONTINUATION frames. When it 
happens, the buggy H2 sessions are waiting for more data with a full buffer, 
resulting to wakeups in loop. But the threads don't hang. Incoming connections 
are still accepted and processed. But all the CPU is consumed.


However, if you observe loops on _do_poll/epoll_wait, it is probably another 
bug. Because here with a "perf top" on haproxy, you should have H2 functions in 
the top 10.


I'll push my fix this morning. So you may just wait a bit to confirm it is the 
same bug or not.


--
Christopher Faulet



Re: [2.2.11] 100% CPU again

2021-04-20 Thread Maciej Zdeb
I'm very happy you managed to reproduce a similar issue! :)

Does it affect 2.3.9? I've experienced 100% cpu on it also. But on 2.3 ALL
threads loops on _do_poll/epoll_wait and threads did not hang on a
particular h2 session. :( I'll check twice and return in the new thread for
2.3.9, because as for now it looks like a different bug.

Kind regards
Maciej,

wt., 20 kwi 2021 o 18:38 Christopher Faulet 
napisał(a):

> Le 19/04/2021 à 19:54, Maciej Zdeb a écrit :
> > Hi,
> >
> > After a couple weeks running HAProxy 2.2.11, one server started to loop
> on
> > thread 9. If I'm reading this correctly something went wrong on h2c at
> > 0x7fd7b08d0530.
> >
> [...]
> >
> > ### (gdb) p *((struct h2c *)0x7fd7b08d0530)
> > $1 = {conn = 0x7fd785b87990, st0 = H2_CS_FRAME_P, errcode =
> H2_ERR_NO_ERROR,
> > flags = 8, streams_limit = 100, max_id = 0, rcvd_c = 0, rcvd_s = 0,
> >ddht = 0x7fd7b0820d60, dbuf = {size = 16384,
> >  area = 0x7fd7b1dec0b0
> >
> "¤\226\205\064\f\212jܧâ\201\004A\fN\177jC]t\027\221cÌd°à\033\\+\205µ?¬Ø÷èÏô¥\006êU1\024\235O",
>
> > data = 16384, head = 48},
> >dsi = 1, dfl = 16383, dft = 1 '\001', dff = 1 '\001', dpl = 0 '\000',
> > last_sid = -1, mbuf = {{size = 32,
> >area = 0x2 , data =
> 1, head =
> > 1}, {size = 0, area = 0x0, data = 0, head = 0} , {
> >size = 0, area = 0x0, data = 0, head = 12}, {size = 1249, area =
> > 0x7fd7b080 "ðHX²×\177", data = 140564347486336, head = 0}}, msi =
> -1, mfl = 0,
> >mft = 32 ' ', mff = -96 ' ', miw = 65535, mws = 65535, mfs = 16384,
> timeout =
> > 2, shut_timeout = 2, nb_streams = 0, nb_cs = 0, nb_reserved = 0,
> >stream_cnt = 0, proxy = 0x5557136cbe60, task = 0x7fd792079be0,
> streams_by_id
> > = {b = {0x0, 0x0}}, send_list = {n = 0x7fd7b08d09d8, p = 0x7fd7b08d09d8},
> >fctl_list = {n = 0x7fd7b08d09e8, p = 0x7fd7b08d09e8}, blocked_list =
> {n =
> > 0x7fd7b08d09f8, p = 0x7fd7b08d09f8}, buf_wait = {target = 0x0, wakeup_cb
> = 0x0,
> >  list = {next = 0x7fd7b08d0a18, prev = 0x7fd7b08d0a18}}, wait_event
> =
> > {tasklet = 0x7fd79e235bf0, events = 0}}
> >
>
> Hi Maciej,
>
> I'm able to reproduce a similar bug hacking the nghttp2 client to send at
> most
> 16383 bytes per frame (instead of 16384). By sending too large headers, we
> are
> falling into a wakeup loop, waiting for more data while the buffer is full.
>
> I have a fix but I must check with Willy how to proceed because I'm not
> 100%
> sure for now.
>
> --
> Christopher Faulet
>


Re: [2.2.11] 100% CPU again

2021-04-20 Thread Robin H. Johnson
On Tue, Apr 20, 2021 at 06:38:48PM +0200, Christopher Faulet wrote:
> I'm able to reproduce a similar bug hacking the nghttp2 client to send at 
> most 
> 16383 bytes per frame (instead of 16384). By sending too large headers, we 
> are 
> falling into a wakeup loop, waiting for more data while the buffer is full.
> 
> I have a fix but I must check with Willy how to proceed because I'm not 100% 
> sure for now.
Can you share the hack to nghttp2 so I can compare against our crash
behaviors?

-- 
Robin Hugh Johnson
E-Mail : robb...@orbis-terrarum.net
Home Page  : http://www.orbis-terrarum.net/?l=people.robbat2
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85


signature.asc
Description: PGP signature


Re: [2.2.11] 100% CPU again

2021-04-20 Thread Christopher Faulet

Le 19/04/2021 à 19:54, Maciej Zdeb a écrit :

Hi,

After a couple weeks running HAProxy 2.2.11, one server started to loop on 
thread 9. If I'm reading this correctly something went wrong on h2c at 
0x7fd7b08d0530.



[...]


### (gdb) p *((struct h2c *)0x7fd7b08d0530)
$1 = {conn = 0x7fd785b87990, st0 = H2_CS_FRAME_P, errcode = H2_ERR_NO_ERROR, 
flags = 8, streams_limit = 100, max_id = 0, rcvd_c = 0, rcvd_s = 0,

   ddht = 0x7fd7b0820d60, dbuf = {size = 16384,
     area = 0x7fd7b1dec0b0 
"¤\226\205\064\f\212jܧâ\201\004A\fN\177jC]t\027\221cÌd°à\033\\+\205µ?¬Ø÷èÏô¥\006êU1\024\235O", 
data = 16384, head = 48},
   dsi = 1, dfl = 16383, dft = 1 '\001', dff = 1 '\001', dpl = 0 '\000', 
last_sid = -1, mbuf = {{size = 32,
       area = 0x2 , data = 1, head = 
1}, {size = 0, area = 0x0, data = 0, head = 0} , {
       size = 0, area = 0x0, data = 0, head = 12}, {size = 1249, area = 
0x7fd7b080 "ðHX²×\177", data = 140564347486336, head = 0}}, msi = -1, mfl = 0,
   mft = 32 ' ', mff = -96 ' ', miw = 65535, mws = 65535, mfs = 16384, timeout = 
2, shut_timeout = 2, nb_streams = 0, nb_cs = 0, nb_reserved = 0,
   stream_cnt = 0, proxy = 0x5557136cbe60, task = 0x7fd792079be0, streams_by_id 
= {b = {0x0, 0x0}}, send_list = {n = 0x7fd7b08d09d8, p = 0x7fd7b08d09d8},
   fctl_list = {n = 0x7fd7b08d09e8, p = 0x7fd7b08d09e8}, blocked_list = {n = 
0x7fd7b08d09f8, p = 0x7fd7b08d09f8}, buf_wait = {target = 0x0, wakeup_cb = 0x0,
     list = {next = 0x7fd7b08d0a18, prev = 0x7fd7b08d0a18}}, wait_event = 
{tasklet = 0x7fd79e235bf0, events = 0}}




Hi Maciej,

I'm able to reproduce a similar bug hacking the nghttp2 client to send at most 
16383 bytes per frame (instead of 16384). By sending too large headers, we are 
falling into a wakeup loop, waiting for more data while the buffer is full.


I have a fix but I must check with Willy how to proceed because I'm not 100% 
sure for now.


--
Christopher Faulet



[2.2.11] 100% CPU again

2021-04-19 Thread Maciej Zdeb
Hi,

After a couple weeks running HAProxy 2.2.11, one server started to loop on
thread 9. If I'm reading this correctly something went wrong on h2c at
0x7fd7b08d0530.

### show threads
Thread 1 : id=0x7fd8855a2200 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=24099004674002 now=24099004739108 diff=65106
 curr_task=0
  Thread 2 : id=0x7fd883786700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=23776498018239 now=23776498104181 diff=85942
 curr_task=0
  Thread 3 : id=0x7fd882b5b700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=23713672613057 now=23713672680973 diff=67916
 curr_task=0
  Thread 4 : id=0x7fd881416700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=23752667768434 now=23752667910419 diff=141985
 curr_task=0
  Thread 5 : id=0x7fd880c15700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=23726859650451 now=23726859726954 diff=76503
 curr_task=0
  Thread 6 : id=0x7fd7d1488700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=2398264056 now=23982641100021 diff=100065
 curr_task=0
* Thread 7 : id=0x7fd7d0c87700 act=1 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=0 wantrdv=0
 cpu_ns: poll=23482398415461 now=23482398534664 diff=119203
 curr_task=0x5557150b7400 (task) calls=1 last=0
   fct=0x55570d2c8f40(task_run_applet) ctx=0x7fd7949a1a20()
 strm=0x7fd87c322d70 src=unix fe=GLOBAL be=GLOBAL dst=
 rqf=c48200 rqa=0 rpf=80008000 rpa=0 sif=EST,28
sib=EST,204018
 af=(nil),0 csf=0x555716465060,8200
 ab=0x7fd7949a1a20,9 csb=(nil),0

 cof=0x7fd791941e90,1300:PASS(0x7fd78411c1f0)/RAW((nil))/unix_stream(1271)
 cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0)

  Thread 8 : id=0x7fd7d0486700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=23504865560248 now=23504865701145 diff=140897
 curr_task=0
  Thread 9 : id=0x7fd7cfc85700 act=1 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=0 wantrdv=0
 cpu_ns: poll=24731954111541 now=24731954183811 diff=72270
 curr_task=0x7fd79e235bf0 (tasklet) calls=1301021937
   fct=0x55570d1c1390(h2_io_cb) ctx=0x7fd7b08d0530

  Thread 10: id=0x7fd7cf484700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=23544763552254 now=23544763716578 diff=164324
 curr_task=0
  Thread 11: id=0x7fd7cec83700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=23918013478538 now=23918013624911 diff=146373
 curr_task=0
  Thread 12: id=0x7fd7ce482700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=23657530305687 now=23657530468631 diff=162944
 curr_task=0
  Thread 13: id=0x7fd7cdc81700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=23491571084184 now=23491571212288 diff=128104
 curr_task=0
  Thread 14: id=0x7fd7cd480700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=24048462813219 now=24048462979499 diff=166280
 curr_task=0
  Thread 15: id=0x7fd7ccc7f700 act=0 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=23948328115728 now=23948328275800 diff=160072
 curr_task=0
  Thread 16: id=0x7fd7cc47e700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=24133419414190 now=24133419605024 diff=190834
 curr_task=0
  Thread 17: id=0x7fd7cbc7d700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=24124631248817 now=24124631415673 diff=166856
 curr_task=0
  Thread 18: id=0x7fd7cb47c700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=23906001641575 now=23906001827864 diff=186289
 curr_task=0
  Thread 19: id=0x7fd7cac7b700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0
 stuck=0 prof=0 harmless=1 wantrdv=0
 cpu_ns: poll=23969313722856 now=23969313891707 diff=168851
 curr_task=0
  Thread 20: id=0x7fd7ca47a700 act=0 glob=0 wq=0 rq=0 tl=0 tlsz=0 rqsz=0