Re: Browser downloads failing because of h2c_send_goaway_error (1.8.17 + 1.9.1)
Hi Wert, On Thu, Jan 17, 2019 at 01:38:52PM +0300, Wert wrote: > It makes large downloads completely unusable even with not very often reloads. Yesterday while addressing some H2 issues, I found a bug which can cause exactly what you were seeing. I'm interested in knowing whether the latest 1.9 fixes this for you : http://git.haproxy.org/?p=haproxy-1.9.git;a=snapshot;h=HEAD;sf=tgz Thanks, Willy
Re: Browser downloads failing because of h2c_send_goaway_error (1.8.17 + 1.9.1)
Hi > Hi, > On Wed, Jan 16, 2019 at 10:32:00AM +0300, Wert wrote: >> How to reproduce: >> 1. Start browser-download (content-disposition: attachment) of some big file >> through H2 >> * Tested with 1Gb file and several Chrome-versions (67-) >> 2. Make reload >> 3. Process with this connection would stay, transfer everything successfully >> and even send to log info about code-200 full-size transfer without any >> termination flags >> 4. Chrome would show "Network error" at the end of transfer >> >> Versions: >> 1.8.17 - it happens in all cases >> #c4d56e5 - looks like it happens only in some conditions, but just because >> there is another bug that makes old processes stay much longer than with >> 1.8.17 >> >> Fix: >> For 1.8.17 I just disabled condition "if (sess && unlikely(sess->fe->state >> == PR_STSTOPPED))" leading to h2c_send_goaway_error() in mux_h2.c:2330 >> It is not correct way and probably breaks something else, but fixes this > I understand what's happening. And it is caused by the H2 state machine > which does not have provisions for something equivalent to the LAST_ACK > TCP state. A stream is closed as soon as the END_STREAM flag has been > seen in both directions. Thus during such a reload, we end up with a > connection which doesn't have any stream anymore but still bytes in > flight for the last stream(s). Since we have a reload in progress, we > know the process is waiting for old connections to leave, thus we close > this idle and unused connection. The problem is that the client is still > sending WINDOW_UPDATES for the data it receives, and that these ones will > cause a TCP RST to be emitted. Depending on the packet ordering on the > path and on the remote TCP stack, this can lead to truncated responses, > or even in a client reporting an error after it correctly gets everything. > I'd like to study the possibility to introduce an equivalent of the LAST_ACK > state for streams. In theory it should not be very difficult but in practice > it's tricky because frames are not really ACKed, the client sends us more > budget to send more data, so we never exactly know when all data were received > on the other end, though it serves as a hint. Ideally we could consider that > we can watch the window increments only and compare them to the amount of > data sent. > I think it's something I'll bring to the IETF HTTP working group, because > it could actually serve to address another issue related to priorities > which are sometimes incorrectly re-attached when streams disappear. > For the short term however I don't have any solution to this, and tricks > such as the one above are indeed risky, you may face connections which > never die :-/ > Best regards, > Willy Thank you for answer. It makes large downloads completely unusable even with not very often reloads. "Never die" connection looks not so big problem since it is stopping process and "hard-stop-after" can deal with it. It is less evil for many cases. How about optional delayed-processing of h2c_send_goaway_error (exclusively for stopping processes)? Like separate queue for "connections awaiting send of goaway-error" -- Wert
Re: Browser downloads failing because of h2c_send_goaway_error (1.8.17 + 1.9.1)
Hi, On Wed, Jan 16, 2019 at 10:32:00AM +0300, Wert wrote: > How to reproduce: > 1. Start browser-download (content-disposition: attachment) of some big file > through H2 > * Tested with 1Gb file and several Chrome-versions (67-) > 2. Make reload > 3. Process with this connection would stay, transfer everything successfully > and even send to log info about code-200 full-size transfer without any > termination flags > 4. Chrome would show "Network error" at the end of transfer > > Versions: > 1.8.17 - it happens in all cases > #c4d56e5 - looks like it happens only in some conditions, but just because > there is another bug that makes old processes stay much longer than with > 1.8.17 > > Fix: > For 1.8.17 I just disabled condition "if (sess && unlikely(sess->fe->state == > PR_STSTOPPED))" leading to h2c_send_goaway_error() in mux_h2.c:2330 > It is not correct way and probably breaks something else, but fixes this I understand what's happening. And it is caused by the H2 state machine which does not have provisions for something equivalent to the LAST_ACK TCP state. A stream is closed as soon as the END_STREAM flag has been seen in both directions. Thus during such a reload, we end up with a connection which doesn't have any stream anymore but still bytes in flight for the last stream(s). Since we have a reload in progress, we know the process is waiting for old connections to leave, thus we close this idle and unused connection. The problem is that the client is still sending WINDOW_UPDATES for the data it receives, and that these ones will cause a TCP RST to be emitted. Depending on the packet ordering on the path and on the remote TCP stack, this can lead to truncated responses, or even in a client reporting an error after it correctly gets everything. I'd like to study the possibility to introduce an equivalent of the LAST_ACK state for streams. In theory it should not be very difficult but in practice it's tricky because frames are not really ACKed, the client sends us more budget to send more data, so we never exactly know when all data were received on the other end, though it serves as a hint. Ideally we could consider that we can watch the window increments only and compare them to the amount of data sent. I think it's something I'll bring to the IETF HTTP working group, because it could actually serve to address another issue related to priorities which are sometimes incorrectly re-attached when streams disappear. For the short term however I don't have any solution to this, and tricks such as the one above are indeed risky, you may face connections which never die :-/ Best regards, Willy
Browser downloads failing because of h2c_send_goaway_error (1.8.17 + 1.9.1)
How to reproduce: 1. Start browser-download (content-disposition: attachment) of some big file through H2 * Tested with 1Gb file and several Chrome-versions (67-) 2. Make reload 3. Process with this connection would stay, transfer everything successfully and even send to log info about code-200 full-size transfer without any termination flags 4. Chrome would show "Network error" at the end of transfer Versions: 1.8.17 - it happens in all cases #c4d56e5 - looks like it happens only in some conditions, but just because there is another bug that makes old processes stay much longer than with 1.8.17 Fix: For 1.8.17 I just disabled condition "if (sess && unlikely(sess->fe->state == PR_STSTOPPED))" leading to h2c_send_goaway_error() in mux_h2.c:2330 It is not correct way and probably breaks something else, but fixes this Tcpdump for end of failed download: 04:28:16.947674 IP client.example.com.53228 > server.example.com.https: Flags [.], ack 1189837535, win 3645, length 0 04:28:16.947685 IP server.example.com.https > client.example.com.53228: Flags [.], seq 1190016095:1190018975, ack 22882, win 254, length 2880 04:28:16.947695 IP server.example.com.https > client.example.com.53228: Flags [.], seq 1190018975:1190021855, ack 22882, win 254, length 2880 04:28:16.950105 IP client.example.com.53228 > server.example.com.https: Flags [.], ack 1189843295, win 3645, length 0 04:28:16.950117 IP server.example.com.https > client.example.com.53228: Flags [.], seq 1190021855:1190024735, ack 22882, win 254, length 2880 04:28:16.950127 IP server.example.com.https > client.example.com.53228: Flags [.], seq 1190024735:1190027615, ack 22882, win 254, length 2880 04:28:16.952186 IP client.example.com.53228 > server.example.com.https: Flags [.], ack 1189847615, win 3645, length 0 04:28:16.952194 IP server.example.com.https > client.example.com.53228: Flags [.], seq 1190027615:1190030495, ack 22882, win 254, length 2880 04:28:16.952202 IP server.example.com.https > client.example.com.53228: Flags [.], seq 1190030495:1190033375, ack 22882, win 254, length 2880 04:28:16.954424 IP client.example.com.53228 > server.example.com.https: Flags [.], ack 1189853375, win 3645, length 0 04:28:16.954430 IP server.example.com.https > client.example.com.53228: Flags [.], seq 1190033375:1190036255, ack 22882, win 254, length 2880 04:28:16.954438 IP server.example.com.https > client.example.com.53228: Flags [.], seq 1190036255:1190039135, ack 22882, win 254, length 2880 04:28:16.954441 IP client.example.com.53228 > server.example.com.https: Flags [P.], seq 22882:22924, ack 1189853375, win 3645, length 42 04:28:16.954496 IP server.example.com.https > client.example.com.53228: Flags [R], seq 1015944401, win 0, length 0 04:28:16.954690 IP client.example.com.53228 > server.example.com.https: Flags [.], ack 1189856255, win 3645, length 0 04:28:16.954704 IP server.example.com.https > client.example.com.53228: Flags [R], seq 1015947281, win 0, length 0 04:28:16.959056 IP client.example.com.53228 > server.example.com.https: Flags [.], ack 1189862015, win 3645, length 0 ... --- Wert