Re: 1.8.3: Slow posts on H2 (IE only?)

2018-01-04 Thread Willy Tarreau
On Thu, Jan 04, 2018 at 04:03:29PM +0100, Peter Lindegaard Hansen wrote:
> Hi Willy & Lukas,
> 
> I did on a test VM using the latest git version
> # haproxy -v
> HA-Proxy version 1.8.3-646d23-1 2018/01/04
> Copyright 2000-2017 Willy Tarreau 
> 
> And can confirm that the issues related to post+redirects are fixed for us.
> 
> And the other issue that we found seems to be resolved too.
> 
> Perfect, thanks much! :)

Oh that's great. Thanks for testing!

Willy



Re: 1.8.3: Slow posts on H2 (IE only?)

2018-01-04 Thread Peter Lindegaard Hansen
Hi Willy & Lukas,

I did on a test VM using the latest git version
# haproxy -v
HA-Proxy version 1.8.3-646d23-1 2018/01/04
Copyright 2000-2017 Willy Tarreau 

And can confirm that the issues related to post+redirects are fixed for us.

And the other issue that we found seems to be resolved too.

Perfect, thanks much! :)



Med venlig hilsen


*Peter Lindegaard Hansen*

*Softwareudvikler / Partner*

Telefon: +45 96 500 300 | Direkte: 69 14 97 04 | Email: p...@tigermedia.dk
Tiger Media A/S | Gl. Gugvej 17C | 9000 Aalborg | Web: www.tigermedia.dk

For supportspørgsmål kontakt os da på supp...@tigermedia.dk eller på tlf.
96 500 300
og din henvendelse vil blive besvaret af første ledige medarbejder.

2018-01-04 14:52 GMT+01:00 Willy Tarreau :

> Hi Peter,
>
> On Wed, Jan 03, 2018 at 11:04:03PM +0100, Peter Lindegaard Hansen wrote:
> > I will try to apply the patch (i am no expert at this) and see the
> results.
>
> Well, don't waste your time trying to figure stuff you're not comfortable
> with. I've just committed the fix and backported it. If you're familiar
> with git, pulling the latest version will give you the fixed code.
> Otherwise
> just wait for tomorrow morning, the next nightly snapshot will
> automatically
> contain the fix.
>
> > I did find another h2-related issue, but where unable to pinpoint exactly
> > why as it where deep in ajax application, but i will see with this patch
> if
> > its the same issue.
>
> OK, let's wait for new tests with the patch first, and otherwise once
> you have more info, do not hesitate to open another report.
>
> Thanks!
> Willy
>


Re: 1.8.3: Slow posts on H2 (IE only?)

2018-01-04 Thread Willy Tarreau
Hi Peter,

On Wed, Jan 03, 2018 at 11:04:03PM +0100, Peter Lindegaard Hansen wrote:
> I will try to apply the patch (i am no expert at this) and see the results.

Well, don't waste your time trying to figure stuff you're not comfortable
with. I've just committed the fix and backported it. If you're familiar
with git, pulling the latest version will give you the fixed code. Otherwise
just wait for tomorrow morning, the next nightly snapshot will automatically
contain the fix.

> I did find another h2-related issue, but where unable to pinpoint exactly
> why as it where deep in ajax application, but i will see with this patch if
> its the same issue.

OK, let's wait for new tests with the patch first, and otherwise once
you have more info, do not hesitate to open another report.

Thanks!
Willy



Re: 1.8.3: Slow posts on H2 (IE only?)

2018-01-03 Thread Peter Lindegaard Hansen
Hi,

What a quick response on this issue :)

I will try to apply the patch (i am no expert at this) and see the results.
I did find another h2-related issue, but where unable to pinpoint exactly
why as it where deep in ajax application, but i will see with this patch if
its the same issue.





Med venlig hilsen


*Peter Lindegaard Hansen*

*Softwareudvikler / Partner*

Telefon: +45 96 500 300 | Direkte: 69 14 97 04 | Email: p...@tigermedia.dk
Tiger Media A/S | Gl. Gugvej 17C | 9000 Aalborg | Web: www.tigermedia.dk

For supportspørgsmål kontakt os da på supp...@tigermedia.dk eller på tlf.
96 500 300
og din henvendelse vil blive besvaret af første ledige medarbejder.

2018-01-03 22:34 GMT+01:00 Lukas Tribus :

> Hello,
>
>
> On Wed, Jan 3, 2018 at 9:51 PM, Willy Tarreau  wrote:
> > On Wed, Jan 03, 2018 at 09:31:47PM +0100, Willy Tarreau wrote:
> >> Oh I think you've just put your finger on it. I remember taking care
> >> of handling 0-sized frames, and facing certain difficulties with them
> >> (eg: sometimes returning size 0 just means nothing was done). I sounds
> >> very likely that we can still have a bug around this. It would also
> >> explain why your patch could get rid of it.
> >>
> >> I'll have a look at the code in case I have an idea.
> >
> > Could you please try the attached patch? I'm pretty sure it is *very*
> > related to your observations. In fact till now we would not update the
> > parser's state on an empty data frame, which explains why you had to
> > move the stuff around.
>
> Yes, I can confirm this fixes the problem for me, and in a good
> looking way too: all streams are correctly handled within that single
> connection (no sudden connection teardown and subsequent new
> connection for follow-up streams).
>
> Now we won't even bother the ML subscribers with a 50+ messages
> mega-thread debugging an obscure H2 issue :)
>
>
> regards,
> lukas
>


Re: 1.8.3: Slow posts on H2 (IE only?)

2018-01-03 Thread Lukas Tribus
Hello,


On Wed, Jan 3, 2018 at 9:51 PM, Willy Tarreau  wrote:
> On Wed, Jan 03, 2018 at 09:31:47PM +0100, Willy Tarreau wrote:
>> Oh I think you've just put your finger on it. I remember taking care
>> of handling 0-sized frames, and facing certain difficulties with them
>> (eg: sometimes returning size 0 just means nothing was done). I sounds
>> very likely that we can still have a bug around this. It would also
>> explain why your patch could get rid of it.
>>
>> I'll have a look at the code in case I have an idea.
>
> Could you please try the attached patch? I'm pretty sure it is *very*
> related to your observations. In fact till now we would not update the
> parser's state on an empty data frame, which explains why you had to
> move the stuff around.

Yes, I can confirm this fixes the problem for me, and in a good
looking way too: all streams are correctly handled within that single
connection (no sudden connection teardown and subsequent new
connection for follow-up streams).

Now we won't even bother the ML subscribers with a 50+ messages
mega-thread debugging an obscure H2 issue :)


regards,
lukas



Re: 1.8.3: Slow posts on H2 (IE only?)

2018-01-03 Thread Willy Tarreau
On Wed, Jan 03, 2018 at 09:31:47PM +0100, Willy Tarreau wrote:
> Oh I think you've just put your finger on it. I remember taking care
> of handling 0-sized frames, and facing certain difficulties with them
> (eg: sometimes returning size 0 just means nothing was done). I sounds
> very likely that we can still have a bug around this. It would also
> explain why your patch could get rid of it.
> 
> I'll have a look at the code in case I have an idea.

Could you please try the attached patch? I'm pretty sure it is *very*
related to your observations. In fact till now we would not update the
parser's state on an empty data frame, which explains why you had to
move the stuff around.

Thanks!
Willy
diff --git a/src/mux_h2.c b/src/mux_h2.c
index c6e15ec..6348826 100644
--- a/src/mux_h2.c
+++ b/src/mux_h2.c
@@ -2770,7 +2770,7 @@ static int h2_frt_transfer_data(struct h2s *h2s, struct 
buffer *buf, int count)
 
flen = h2c->dfl - h2c->dpl;
if (!flen)
-   return 0;
+   goto end_transfer;
 
if (flen > h2c->dbuf->i) {
flen = h2c->dbuf->i;
@@ -2817,6 +2817,7 @@ static int h2_frt_transfer_data(struct h2s *h2s, struct 
buffer *buf, int count)
return flen;
}
 
+end_transfer:
/* here we're done with the frame, all the payload (except padding) was
 * transferred.
 */


Re: 1.8.3: Slow posts on H2 (IE only?)

2018-01-03 Thread Willy Tarreau
On Wed, Jan 03, 2018 at 09:07:35PM +0100, Lukas Tribus wrote:
> When the client sends the POST, haproxy responds with the 302 from the
> backend (stream 3), the client follows the redirect sending a the new
> GET (stream 5) which haproxy never sees. However, with this patch,
> haproxy also sends GOAWAY (Last-Stream-ID = 3) and the entire
> connection is killed. Because the browser understands that stream 5
> was never considered, the browser recovers from this situation by
> resending the GET request in a new connection immediately.

OK I understand, thus in fact it "hides" the problem.

> Before my patch, after the the 302 redirect in stream 3, and the GET
> request in stream 5, the connection basically hangs, and haproxy
> doesn't 'see' stream 5 and the browser waits for a response, until
> haproxy kills the connection (due to timeout).

Interesting. It could indicate that haproxy is stuck in the FRAME_P state
waiting for the data of the initial stream and for some reason never
escapes.

> I took a look of the behavior of Edge compared to other, unaffected
> browsers and one thing caught my attention:
> Edge sends 2 DATA frames, the first one with the actual POST data (no
> ES bit set), the second DATA frame is empty (Length: 0) and has the ES
> bit set. All other browsers just have one single DATA frame with both
> the payload and the ES bit set. I assume the 0-length DATA frame may
> be something haproxy is unable to handle and brings the entire
> connection into a bogus state?

Oh I think you've just put your finger on it. I remember taking care
of handling 0-sized frames, and facing certain difficulties with them
(eg: sometimes returning size 0 just means nothing was done). I sounds
very likely that we can still have a bug around this. It would also
explain why your patch could get rid of it.

I'll have a look at the code in case I have an idea.

Thanks!
Willy



Re: 1.8.3: Slow posts on H2 (IE only?)

2018-01-03 Thread Lukas Tribus
Hello,


On Wed, Jan 3, 2018 at 5:56 PM, Willy Tarreau  wrote:
>> When moving the affected HREM code above the "return 0" branch, Edge
>> works fine again.
>>
>> The attached patch fixes the issue for me, please give it a try.
>
> The problem with doing this is that when we have to stop transferring
> data because for example the target buffer is full, we'll still switch
> to HREM anyway and randomly destroy the last data frame. It was one of
> the reasons some uploads were randomly truncated.
>
> But if it fixes the issue for you, it may indicate we're not properly
> woken up sometimes during the upload. If I understand well, it only
> happens on redirects, so maybe one transition is causing trouble here,
> typically we may have an HLOC state (half-close(local)) when the response
> is sent by the server before the client finishes to upload, and it's
> possible that we don't continue to drain pending data there. With
> your change it likely causes the stream to be aborted on this last
> frame, unblocking the situation. That's just a wild guess, as usual :-/

In fact, with my patch the situation is not desirable at all:

When the client sends the POST, haproxy responds with the 302 from the
backend (stream 3), the client follows the redirect sending a the new
GET (stream 5) which haproxy never sees. However, with this patch,
haproxy also sends GOAWAY (Last-Stream-ID = 3) and the entire
connection is killed. Because the browser understands that stream 5
was never considered, the browser recovers from this situation by
resending the GET request in a new connection immediately.

Before my patch, after the the 302 redirect in stream 3, and the GET
request in stream 5, the connection basically hangs, and haproxy
doesn't 'see' stream 5 and the browser waits for a response, until
haproxy kills the connection (due to timeout).


I took a look of the behavior of Edge compared to other, unaffected
browsers and one thing caught my attention:
Edge sends 2 DATA frames, the first one with the actual POST data (no
ES bit set), the second DATA frame is empty (Length: 0) and has the ES
bit set. All other browsers just have one single DATA frame with both
the payload and the ES bit set. I assume the 0-length DATA frame may
be something haproxy is unable to handle and brings the entire
connection into a bogus state?


Regards,
Lukas



Re: 1.8.3: Slow posts on H2 (IE only?)

2018-01-03 Thread Willy Tarreau
Hi Lukas,

On Wed, Jan 03, 2018 at 05:34:36PM +0100, Lukas Tribus wrote:
> Hello Peter,
> 
> 
> On Wed, Jan 3, 2018 at 2:59 PM, Lukas Tribus  wrote:
> > I will come back later and take a deeper look at both strace and the 
> > capture.
> 
> So, this is broken since c4134ba8b0 ("BUG/MEDIUM: h2: don't switch the
> state to HREM before end of DATA frame").
> And indeed Edge sets the ES bit in the last DATA frame of the POST issue.

They should all do that anyway since the ES bit marks the end of the stream.
But it's possible there's a small difference between implementations leading
to Edge triggering our bug easier.

> When moving the affected HREM code above the "return 0" branch, Edge
> works fine again.
> 
> The attached patch fixes the issue for me, please give it a try.

The problem with doing this is that when we have to stop transferring
data because for example the target buffer is full, we'll still switch
to HREM anyway and randomly destroy the last data frame. It was one of
the reasons some uploads were randomly truncated.

But if it fixes the issue for you, it may indicate we're not properly
woken up sometimes during the upload. If I understand well, it only
happens on redirects, so maybe one transition is causing trouble here,
typically we may have an HLOC state (half-close(local)) when the response
is sent by the server before the client finishes to upload, and it's
possible that we don't continue to drain pending data there. With
your change it likely causes the stream to be aborted on this last
frame, unblocking the situation. That's just a wild guess, as usual :-/

Willy



Re: 1.8.3: Slow posts on H2 (IE only?)

2018-01-03 Thread Lukas Tribus
Hello Peter,


On Wed, Jan 3, 2018 at 2:59 PM, Lukas Tribus  wrote:
> I will come back later and take a deeper look at both strace and the capture.

So, this is broken since c4134ba8b0 ("BUG/MEDIUM: h2: don't switch the
state to HREM before end of DATA frame").
And indeed Edge sets the ES bit in the last DATA frame of the POST issue.

When moving the affected HREM code above the "return 0" branch, Edge
works fine again.


The attached patch fixes the issue for me, please give it a try.


cheers,
lukas
diff --git a/src/mux_h2.c b/src/mux_h2.c
index c6e15ec..c3fc64e 100644
--- a/src/mux_h2.c
+++ b/src/mux_h2.c
@@ -1693,19 +1693,18 @@ static int h2c_frt_handle_data(struct h2c *h2c, struct 
h2s *h2s)
h2c->st0 = H2_CS_FRAME_E;
}
 
-   /* check for completion : the callee will change this to FRAME_A or
-* FRAME_H once done.
-*/
-   if (h2c->st0 == H2_CS_FRAME_P)
-   return 0;
-
-
/* last frame */
if (h2c->dff & H2_F_DATA_END_STREAM) {
h2s->st = H2_SS_HREM;
h2s->flags |= H2_SF_ES_RCVD;
}
 
+   /* check for completion : the callee will change this to FRAME_A or
+* FRAME_H once done.
+*/
+   if (h2c->st0 == H2_CS_FRAME_P)
+   return 0;
+
return 1;
 
  conn_err:


Re: 1.8.3: Slow posts on H2 (IE only?)

2018-01-03 Thread Lukas Tribus
Hello Peter,


On Wed, Jan 3, 2018 at 12:00 PM, Peter Lindegaard Hansen
 wrote:
>
> Hi List,
>
> We updated one of our haproxy boxes to the newly released 1.8.3 - thanks!
>
> Then we got reports of IE being slow, when we looked into the claims we found 
> that it seems to related to POSTs that respond in a header location redirect.
>
> We can reproduce the slowness with a ultra simple php script like this:

Thanks, I can reproduce it.

I don't see the CONNECT you are saying, but the behavior is as broken
as it is on your end here. The problem is not the redirect itself, but
the GET request after the 302 redirect.
I was able to downgrade TLS to non-FS ciphers (Edge does not reject it
in H2 as would be required by the specs), so I was able to decrypt it.
Haproxy does not respond to the (post 302 redirect) GET request at all
(it doesn't even show up in the debug log).

Attaching capture, private SSL key (invalid certificate) and strace output.


I did not go through the strace output yet and I couldn't find
anything wrong with the browsers request in the capture. I'm posting
this data now here. I will come back later and take a deeper look at
both strace and the capture.


The private key needs to be configured for 10.0.0.33 port 443 protocol
http in Wireshark (Preferences/Protocols/SSL/RSA Keys), for the
session to be decrypted.




cheers,
lukas
lukas@dev:~/haproxy$ sudo strace -tt ./haproxy -f ../haproxy.cfg -d
14:36:22.959623 execve("./haproxy", ["./haproxy", "-f", "../haproxy.cfg", 
"-d"], [/* 15 vars */]) = 0
14:36:22.960767 brk(NULL)   = 0x2981000
14:36:22.961097 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or 
directory)
14:36:22.961349 mmap(NULL, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fce5ee24000
14:36:22.961598 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or 
directory)
14:36:22.961911 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
14:36:22.962149 fstat(3, {st_mode=S_IFREG|0644, st_size=24208, ...}) = 0
14:36:22.962427 mmap(NULL, 24208, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fce5ee1e000
14:36:22.962642 close(3)= 0
14:36:22.962920 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or 
directory)
14:36:22.963137 open("/lib/x86_64-linux-gnu/libcrypt.so.1", O_RDONLY|O_CLOEXEC) 
= 3
14:36:22.963340 read(3, 
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\v\0\0\0\0\0\0"..., 832) = 832
14:36:22.963623 fstat(3, {st_mode=S_IFREG|0644, st_size=39224, ...}) = 0
14:36:22.963839 mmap(NULL, 2322912, PROT_READ|PROT_EXEC, 
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fce5e9c9000
14:36:22.964050 mprotect(0x7fce5e9d2000, 2093056, PROT_NONE) = 0
14:36:22.964273 mmap(0x7fce5ebd1000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8000) = 0x7fce5ebd1000
14:36:22.964543 mmap(0x7fce5ebd3000, 184800, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fce5ebd3000
14:36:22.964764 close(3)= 0
14:36:22.964993 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or 
directory)
14:36:22.965211 open("/lib/x86_64-linux-gnu/libpthread.so.0", 
O_RDONLY|O_CLOEXEC) = 3
14:36:22.965449 read(3, 
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260`\0\0\0\0\0\0"..., 832) = 832
14:36:22.965647 fstat(3, {st_mode=S_IFREG|0755, st_size=138696, ...}) = 0
14:36:22.965937 mmap(NULL, 2212904, PROT_READ|PROT_EXEC, 
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fce5e7ac000
14:36:22.966147 mprotect(0x7fce5e7c4000, 2093056, PROT_NONE) = 0
14:36:22.966355 mmap(0x7fce5e9c3000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7fce5e9c3000
14:36:22.966609 mmap(0x7fce5e9c5000, 13352, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fce5e9c5000
14:36:22.966827 close(3)= 0
14:36:22.967136 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or 
directory)
14:36:22.967388 open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
14:36:22.967686 read(3, 
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\r\0\0\0\0\0\0"..., 832) = 
832
14:36:22.967901 fstat(3, {st_mode=S_IFREG|0644, st_size=14608, ...}) = 0
14:36:22.968135 mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fce5ee1d000
14:36:22.968346 mmap(NULL, 2109680, PROT_READ|PROT_EXEC, 
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fce5e5a8000
14:36:22.968549 mprotect(0x7fce5e5ab000, 2093056, PROT_NONE) = 0
14:36:22.968732 mmap(0x7fce5e7aa000, 8192, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7fce5e7aa000
14:36:22.968953 close(3)= 0
14:36:22.969189 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or 
directory)
14:36:22.969403 open("/usr/lib/x86_64-linux-gnu/liblua5.3.so.0", 
O_RDONLY|O_CLOEXEC) = 3
14:36:22.969627 read(3, 
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0pi\0\0\0\0\0\0"..., 832) = 832
14:36:22.969867 fstat(3, {st_mode=S_IFREG|0644, st_size=224104, ...}) = 0