Re: 1.8.3: Slow posts on H2 (IE only?)
On Thu, Jan 04, 2018 at 04:03:29PM +0100, Peter Lindegaard Hansen wrote: > Hi Willy & Lukas, > > I did on a test VM using the latest git version > # haproxy -v > HA-Proxy version 1.8.3-646d23-1 2018/01/04 > Copyright 2000-2017 Willy Tarreau> > And can confirm that the issues related to post+redirects are fixed for us. > > And the other issue that we found seems to be resolved too. > > Perfect, thanks much! :) Oh that's great. Thanks for testing! Willy
Re: 1.8.3: Slow posts on H2 (IE only?)
Hi Willy & Lukas, I did on a test VM using the latest git version # haproxy -v HA-Proxy version 1.8.3-646d23-1 2018/01/04 Copyright 2000-2017 Willy TarreauAnd can confirm that the issues related to post+redirects are fixed for us. And the other issue that we found seems to be resolved too. Perfect, thanks much! :) Med venlig hilsen *Peter Lindegaard Hansen* *Softwareudvikler / Partner* Telefon: +45 96 500 300 | Direkte: 69 14 97 04 | Email: p...@tigermedia.dk Tiger Media A/S | Gl. Gugvej 17C | 9000 Aalborg | Web: www.tigermedia.dk For supportspørgsmål kontakt os da på supp...@tigermedia.dk eller på tlf. 96 500 300 og din henvendelse vil blive besvaret af første ledige medarbejder. 2018-01-04 14:52 GMT+01:00 Willy Tarreau : > Hi Peter, > > On Wed, Jan 03, 2018 at 11:04:03PM +0100, Peter Lindegaard Hansen wrote: > > I will try to apply the patch (i am no expert at this) and see the > results. > > Well, don't waste your time trying to figure stuff you're not comfortable > with. I've just committed the fix and backported it. If you're familiar > with git, pulling the latest version will give you the fixed code. > Otherwise > just wait for tomorrow morning, the next nightly snapshot will > automatically > contain the fix. > > > I did find another h2-related issue, but where unable to pinpoint exactly > > why as it where deep in ajax application, but i will see with this patch > if > > its the same issue. > > OK, let's wait for new tests with the patch first, and otherwise once > you have more info, do not hesitate to open another report. > > Thanks! > Willy >
Re: 1.8.3: Slow posts on H2 (IE only?)
Hi Peter, On Wed, Jan 03, 2018 at 11:04:03PM +0100, Peter Lindegaard Hansen wrote: > I will try to apply the patch (i am no expert at this) and see the results. Well, don't waste your time trying to figure stuff you're not comfortable with. I've just committed the fix and backported it. If you're familiar with git, pulling the latest version will give you the fixed code. Otherwise just wait for tomorrow morning, the next nightly snapshot will automatically contain the fix. > I did find another h2-related issue, but where unable to pinpoint exactly > why as it where deep in ajax application, but i will see with this patch if > its the same issue. OK, let's wait for new tests with the patch first, and otherwise once you have more info, do not hesitate to open another report. Thanks! Willy
Re: 1.8.3: Slow posts on H2 (IE only?)
Hi, What a quick response on this issue :) I will try to apply the patch (i am no expert at this) and see the results. I did find another h2-related issue, but where unable to pinpoint exactly why as it where deep in ajax application, but i will see with this patch if its the same issue. Med venlig hilsen *Peter Lindegaard Hansen* *Softwareudvikler / Partner* Telefon: +45 96 500 300 | Direkte: 69 14 97 04 | Email: p...@tigermedia.dk Tiger Media A/S | Gl. Gugvej 17C | 9000 Aalborg | Web: www.tigermedia.dk For supportspørgsmål kontakt os da på supp...@tigermedia.dk eller på tlf. 96 500 300 og din henvendelse vil blive besvaret af første ledige medarbejder. 2018-01-03 22:34 GMT+01:00 Lukas Tribus: > Hello, > > > On Wed, Jan 3, 2018 at 9:51 PM, Willy Tarreau wrote: > > On Wed, Jan 03, 2018 at 09:31:47PM +0100, Willy Tarreau wrote: > >> Oh I think you've just put your finger on it. I remember taking care > >> of handling 0-sized frames, and facing certain difficulties with them > >> (eg: sometimes returning size 0 just means nothing was done). I sounds > >> very likely that we can still have a bug around this. It would also > >> explain why your patch could get rid of it. > >> > >> I'll have a look at the code in case I have an idea. > > > > Could you please try the attached patch? I'm pretty sure it is *very* > > related to your observations. In fact till now we would not update the > > parser's state on an empty data frame, which explains why you had to > > move the stuff around. > > Yes, I can confirm this fixes the problem for me, and in a good > looking way too: all streams are correctly handled within that single > connection (no sudden connection teardown and subsequent new > connection for follow-up streams). > > Now we won't even bother the ML subscribers with a 50+ messages > mega-thread debugging an obscure H2 issue :) > > > regards, > lukas >
Re: 1.8.3: Slow posts on H2 (IE only?)
Hello, On Wed, Jan 3, 2018 at 9:51 PM, Willy Tarreauwrote: > On Wed, Jan 03, 2018 at 09:31:47PM +0100, Willy Tarreau wrote: >> Oh I think you've just put your finger on it. I remember taking care >> of handling 0-sized frames, and facing certain difficulties with them >> (eg: sometimes returning size 0 just means nothing was done). I sounds >> very likely that we can still have a bug around this. It would also >> explain why your patch could get rid of it. >> >> I'll have a look at the code in case I have an idea. > > Could you please try the attached patch? I'm pretty sure it is *very* > related to your observations. In fact till now we would not update the > parser's state on an empty data frame, which explains why you had to > move the stuff around. Yes, I can confirm this fixes the problem for me, and in a good looking way too: all streams are correctly handled within that single connection (no sudden connection teardown and subsequent new connection for follow-up streams). Now we won't even bother the ML subscribers with a 50+ messages mega-thread debugging an obscure H2 issue :) regards, lukas
Re: 1.8.3: Slow posts on H2 (IE only?)
On Wed, Jan 03, 2018 at 09:31:47PM +0100, Willy Tarreau wrote: > Oh I think you've just put your finger on it. I remember taking care > of handling 0-sized frames, and facing certain difficulties with them > (eg: sometimes returning size 0 just means nothing was done). I sounds > very likely that we can still have a bug around this. It would also > explain why your patch could get rid of it. > > I'll have a look at the code in case I have an idea. Could you please try the attached patch? I'm pretty sure it is *very* related to your observations. In fact till now we would not update the parser's state on an empty data frame, which explains why you had to move the stuff around. Thanks! Willy diff --git a/src/mux_h2.c b/src/mux_h2.c index c6e15ec..6348826 100644 --- a/src/mux_h2.c +++ b/src/mux_h2.c @@ -2770,7 +2770,7 @@ static int h2_frt_transfer_data(struct h2s *h2s, struct buffer *buf, int count) flen = h2c->dfl - h2c->dpl; if (!flen) - return 0; + goto end_transfer; if (flen > h2c->dbuf->i) { flen = h2c->dbuf->i; @@ -2817,6 +2817,7 @@ static int h2_frt_transfer_data(struct h2s *h2s, struct buffer *buf, int count) return flen; } +end_transfer: /* here we're done with the frame, all the payload (except padding) was * transferred. */
Re: 1.8.3: Slow posts on H2 (IE only?)
On Wed, Jan 03, 2018 at 09:07:35PM +0100, Lukas Tribus wrote: > When the client sends the POST, haproxy responds with the 302 from the > backend (stream 3), the client follows the redirect sending a the new > GET (stream 5) which haproxy never sees. However, with this patch, > haproxy also sends GOAWAY (Last-Stream-ID = 3) and the entire > connection is killed. Because the browser understands that stream 5 > was never considered, the browser recovers from this situation by > resending the GET request in a new connection immediately. OK I understand, thus in fact it "hides" the problem. > Before my patch, after the the 302 redirect in stream 3, and the GET > request in stream 5, the connection basically hangs, and haproxy > doesn't 'see' stream 5 and the browser waits for a response, until > haproxy kills the connection (due to timeout). Interesting. It could indicate that haproxy is stuck in the FRAME_P state waiting for the data of the initial stream and for some reason never escapes. > I took a look of the behavior of Edge compared to other, unaffected > browsers and one thing caught my attention: > Edge sends 2 DATA frames, the first one with the actual POST data (no > ES bit set), the second DATA frame is empty (Length: 0) and has the ES > bit set. All other browsers just have one single DATA frame with both > the payload and the ES bit set. I assume the 0-length DATA frame may > be something haproxy is unable to handle and brings the entire > connection into a bogus state? Oh I think you've just put your finger on it. I remember taking care of handling 0-sized frames, and facing certain difficulties with them (eg: sometimes returning size 0 just means nothing was done). I sounds very likely that we can still have a bug around this. It would also explain why your patch could get rid of it. I'll have a look at the code in case I have an idea. Thanks! Willy
Re: 1.8.3: Slow posts on H2 (IE only?)
Hello, On Wed, Jan 3, 2018 at 5:56 PM, Willy Tarreauwrote: >> When moving the affected HREM code above the "return 0" branch, Edge >> works fine again. >> >> The attached patch fixes the issue for me, please give it a try. > > The problem with doing this is that when we have to stop transferring > data because for example the target buffer is full, we'll still switch > to HREM anyway and randomly destroy the last data frame. It was one of > the reasons some uploads were randomly truncated. > > But if it fixes the issue for you, it may indicate we're not properly > woken up sometimes during the upload. If I understand well, it only > happens on redirects, so maybe one transition is causing trouble here, > typically we may have an HLOC state (half-close(local)) when the response > is sent by the server before the client finishes to upload, and it's > possible that we don't continue to drain pending data there. With > your change it likely causes the stream to be aborted on this last > frame, unblocking the situation. That's just a wild guess, as usual :-/ In fact, with my patch the situation is not desirable at all: When the client sends the POST, haproxy responds with the 302 from the backend (stream 3), the client follows the redirect sending a the new GET (stream 5) which haproxy never sees. However, with this patch, haproxy also sends GOAWAY (Last-Stream-ID = 3) and the entire connection is killed. Because the browser understands that stream 5 was never considered, the browser recovers from this situation by resending the GET request in a new connection immediately. Before my patch, after the the 302 redirect in stream 3, and the GET request in stream 5, the connection basically hangs, and haproxy doesn't 'see' stream 5 and the browser waits for a response, until haproxy kills the connection (due to timeout). I took a look of the behavior of Edge compared to other, unaffected browsers and one thing caught my attention: Edge sends 2 DATA frames, the first one with the actual POST data (no ES bit set), the second DATA frame is empty (Length: 0) and has the ES bit set. All other browsers just have one single DATA frame with both the payload and the ES bit set. I assume the 0-length DATA frame may be something haproxy is unable to handle and brings the entire connection into a bogus state? Regards, Lukas
Re: 1.8.3: Slow posts on H2 (IE only?)
Hi Lukas, On Wed, Jan 03, 2018 at 05:34:36PM +0100, Lukas Tribus wrote: > Hello Peter, > > > On Wed, Jan 3, 2018 at 2:59 PM, Lukas Tribuswrote: > > I will come back later and take a deeper look at both strace and the > > capture. > > So, this is broken since c4134ba8b0 ("BUG/MEDIUM: h2: don't switch the > state to HREM before end of DATA frame"). > And indeed Edge sets the ES bit in the last DATA frame of the POST issue. They should all do that anyway since the ES bit marks the end of the stream. But it's possible there's a small difference between implementations leading to Edge triggering our bug easier. > When moving the affected HREM code above the "return 0" branch, Edge > works fine again. > > The attached patch fixes the issue for me, please give it a try. The problem with doing this is that when we have to stop transferring data because for example the target buffer is full, we'll still switch to HREM anyway and randomly destroy the last data frame. It was one of the reasons some uploads were randomly truncated. But if it fixes the issue for you, it may indicate we're not properly woken up sometimes during the upload. If I understand well, it only happens on redirects, so maybe one transition is causing trouble here, typically we may have an HLOC state (half-close(local)) when the response is sent by the server before the client finishes to upload, and it's possible that we don't continue to drain pending data there. With your change it likely causes the stream to be aborted on this last frame, unblocking the situation. That's just a wild guess, as usual :-/ Willy
Re: 1.8.3: Slow posts on H2 (IE only?)
Hello Peter, On Wed, Jan 3, 2018 at 2:59 PM, Lukas Tribuswrote: > I will come back later and take a deeper look at both strace and the capture. So, this is broken since c4134ba8b0 ("BUG/MEDIUM: h2: don't switch the state to HREM before end of DATA frame"). And indeed Edge sets the ES bit in the last DATA frame of the POST issue. When moving the affected HREM code above the "return 0" branch, Edge works fine again. The attached patch fixes the issue for me, please give it a try. cheers, lukas diff --git a/src/mux_h2.c b/src/mux_h2.c index c6e15ec..c3fc64e 100644 --- a/src/mux_h2.c +++ b/src/mux_h2.c @@ -1693,19 +1693,18 @@ static int h2c_frt_handle_data(struct h2c *h2c, struct h2s *h2s) h2c->st0 = H2_CS_FRAME_E; } - /* check for completion : the callee will change this to FRAME_A or -* FRAME_H once done. -*/ - if (h2c->st0 == H2_CS_FRAME_P) - return 0; - - /* last frame */ if (h2c->dff & H2_F_DATA_END_STREAM) { h2s->st = H2_SS_HREM; h2s->flags |= H2_SF_ES_RCVD; } + /* check for completion : the callee will change this to FRAME_A or +* FRAME_H once done. +*/ + if (h2c->st0 == H2_CS_FRAME_P) + return 0; + return 1; conn_err:
Re: 1.8.3: Slow posts on H2 (IE only?)
Hello Peter, On Wed, Jan 3, 2018 at 12:00 PM, Peter Lindegaard Hansenwrote: > > Hi List, > > We updated one of our haproxy boxes to the newly released 1.8.3 - thanks! > > Then we got reports of IE being slow, when we looked into the claims we found > that it seems to related to POSTs that respond in a header location redirect. > > We can reproduce the slowness with a ultra simple php script like this: Thanks, I can reproduce it. I don't see the CONNECT you are saying, but the behavior is as broken as it is on your end here. The problem is not the redirect itself, but the GET request after the 302 redirect. I was able to downgrade TLS to non-FS ciphers (Edge does not reject it in H2 as would be required by the specs), so I was able to decrypt it. Haproxy does not respond to the (post 302 redirect) GET request at all (it doesn't even show up in the debug log). Attaching capture, private SSL key (invalid certificate) and strace output. I did not go through the strace output yet and I couldn't find anything wrong with the browsers request in the capture. I'm posting this data now here. I will come back later and take a deeper look at both strace and the capture. The private key needs to be configured for 10.0.0.33 port 443 protocol http in Wireshark (Preferences/Protocols/SSL/RSA Keys), for the session to be decrypted. cheers, lukas lukas@dev:~/haproxy$ sudo strace -tt ./haproxy -f ../haproxy.cfg -d 14:36:22.959623 execve("./haproxy", ["./haproxy", "-f", "../haproxy.cfg", "-d"], [/* 15 vars */]) = 0 14:36:22.960767 brk(NULL) = 0x2981000 14:36:22.961097 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) 14:36:22.961349 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fce5ee24000 14:36:22.961598 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) 14:36:22.961911 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 14:36:22.962149 fstat(3, {st_mode=S_IFREG|0644, st_size=24208, ...}) = 0 14:36:22.962427 mmap(NULL, 24208, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fce5ee1e000 14:36:22.962642 close(3)= 0 14:36:22.962920 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) 14:36:22.963137 open("/lib/x86_64-linux-gnu/libcrypt.so.1", O_RDONLY|O_CLOEXEC) = 3 14:36:22.963340 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\v\0\0\0\0\0\0"..., 832) = 832 14:36:22.963623 fstat(3, {st_mode=S_IFREG|0644, st_size=39224, ...}) = 0 14:36:22.963839 mmap(NULL, 2322912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fce5e9c9000 14:36:22.964050 mprotect(0x7fce5e9d2000, 2093056, PROT_NONE) = 0 14:36:22.964273 mmap(0x7fce5ebd1000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8000) = 0x7fce5ebd1000 14:36:22.964543 mmap(0x7fce5ebd3000, 184800, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fce5ebd3000 14:36:22.964764 close(3)= 0 14:36:22.964993 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) 14:36:22.965211 open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3 14:36:22.965449 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260`\0\0\0\0\0\0"..., 832) = 832 14:36:22.965647 fstat(3, {st_mode=S_IFREG|0755, st_size=138696, ...}) = 0 14:36:22.965937 mmap(NULL, 2212904, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fce5e7ac000 14:36:22.966147 mprotect(0x7fce5e7c4000, 2093056, PROT_NONE) = 0 14:36:22.966355 mmap(0x7fce5e9c3000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7fce5e9c3000 14:36:22.966609 mmap(0x7fce5e9c5000, 13352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fce5e9c5000 14:36:22.966827 close(3)= 0 14:36:22.967136 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) 14:36:22.967388 open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3 14:36:22.967686 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\r\0\0\0\0\0\0"..., 832) = 832 14:36:22.967901 fstat(3, {st_mode=S_IFREG|0644, st_size=14608, ...}) = 0 14:36:22.968135 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fce5ee1d000 14:36:22.968346 mmap(NULL, 2109680, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fce5e5a8000 14:36:22.968549 mprotect(0x7fce5e5ab000, 2093056, PROT_NONE) = 0 14:36:22.968732 mmap(0x7fce5e7aa000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7fce5e7aa000 14:36:22.968953 close(3)= 0 14:36:22.969189 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) 14:36:22.969403 open("/usr/lib/x86_64-linux-gnu/liblua5.3.so.0", O_RDONLY|O_CLOEXEC) = 3 14:36:22.969627 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0pi\0\0\0\0\0\0"..., 832) = 832 14:36:22.969867 fstat(3, {st_mode=S_IFREG|0644, st_size=224104, ...}) = 0