Re: Early connection close, incomplete transfers
On 2019-02-19 06:47, Willy Tarreau wrote: This is interesting. As you observed in the trace you sent me, the lighttpd server closes just after sending the response headers. This indeed matches the "SD" log that aproxy emits. If it doesn't happen in TCP mode nor with Nginx, it means that something haproxy modifies in the request causes this effect on the server. Hi I'm sending answer from colleague who investigated this more thoroughly, especially from lighttpd side: we've been debugging this a bit further and it does not look like the issue with the seemingly random incomplete HTTP responses would be due to any particular request headers at the HTTP layer. It rather looks like something at the TCP level (so specific to HTTP mode): A first observation we made is that the frequency of these incomplete transfers increases when we add a delay at the backend server after sending the response headers and before sending the body data. We added a 100 ms delay there and then got a lot of interrupted transfers that had only received the response headers (= no delay) but 0 bytes of the body (= which was sent just after delay). So the frequency with which this happens appears to be proportional to latencies/stalls in the backend server sending the response data (some read timeout logic at haproxy??). We debugged further and noticed that in all cases where transfers were incomplete our lighttpd backend server was receiving an EPOLLRDHUP event on the socket where it communicates with haproxy. So it appears as if haproxy is *sometimes* (apparently depending on some read latency/stall - see above) shutting down its socket with the backend for writing *before* the full response and body data has been received. And this is also basically ok because the socket remains writeable for lighttpd and so it could still send down the rest of the response data. However, it looks like lighttpd is not expecting this kind of behavior from the client and is not correctly handling such a half-closed TCP session. There is code in lighttpd to handle such a EPOLLRDHUP event and half-closed TCP connection, but lighttpd then also checks the state of the TCP session with getsockopts and keeps the connection open *only* when the state is TCP_CLOSE_WAIT. In all other cases upon receiving the EPOLLRDHUP it actively changes the state of the connection to "ERROR" and then closes the connection: https://github.com/lighttpd/lighttpd1.4/blob/master/src/connections.c#L908 https://github.com/lighttpd/lighttpd1.4/blob/master/src/fdevent.c#L995 We checked and every time we have a incomplete response lighttpd receives the EPOLLRDHUP event on the socket but the tcp state queried via getsockopts is always TCP_CLOSE (and not TCP_CLOSE_WAIT as lighttpd seems to expect). And because of this lighttpd then actively closes the half-closed connection also from its end (which likely is the cause of the TCP FIN sent by lighttpd as seen in the tcpdump). When we remove this condition from lighttpd which marks the connection as errorness in case of EPOLLRDHUP and tcp state != TCP_CLOSE_WAIT, then the problem with the incomplete transfers disappears: https://github.com/lighttpd/lighttpd1.4/blob/master/src/connections.c#L922 We do not understand why this is or what the correct reaction to the EPOLLRDHUP event should be. In particular, we do not understand why lighttpd performs this check for TCP_CLOSE_WAIT or why we always get a state of TCP_CLOSE when we receive this event but the socket still continues to be writeable (so does the TCP_CLOSE just indicate that one direction of the connection is closed??). Still, because this half-closing of the connection to the backed server appears to happen just pretty randomly and depending on latency/stalls of the backend server sending down the response data, we assume that this is not the intended behavior by haproxy (and so possibly indicates some bug in haproxy too). We assume that the reason why direct requests to the backend server or requests proxied via Nginx did never fail is because in these cases there never occurs the EPOLLRDHUP event and there never are half-closed connections. However, we have not tested this (yet), so we did not re-test with Nginx to verify that then indeed lighttpd never sees a EPOLLRDHUP. Any ideas or suggestions based on these findings what should be the proper solution to the problem? Thank you.
Re: Early connection close, incomplete transfers
Hi Veiko, On Thu, Feb 14, 2019 at 02:31:42PM +, Veiko Kukk wrote: > > On 2019-02-01 13:30, Veiko Kukk wrote: > > On 2019-02-01 12:34, Aleksandar Lazic wrote: > > > > > Do you have any errors in lighthttpds log? > > > > Yes, it has error messages about not being enable to write to socket. > > > > Unrecoverable error writing to socket! errno 32, retries 12, ppoll > > return 1, send return -1 > > ERROR: Couldn't write header data to socket! desired: 4565 / actual: -1 > > > > I've tested with several hundred thoused requests, but it never > > happens when using "mode tcp". > > Replaced HAproxy with Nginx for testing and with Nginx, not a single > connection was interrupted, did millions of requests. This is interesting. As you observed in the trace you sent me, the lighttpd server closes just after sending the response headers. This indeed matches the "SD" log that aproxy emits. If it doesn't happen in TCP mode nor with Nginx, it means that something haproxy modifies in the request causes this effect on the server. The request you isolated looks like this : GET /video/_redacted_ HTTP/1.1 Host: 127.0.0.1 X-Request-Id: XXX_redacted_XXX Connection: close X-Forwarded-For: 127.0.0.1 So here what haproxy has apparently modified is : - the X-Request-Id header field that was appended - the "Connection: close" header field because you likely have "option http-server-close" in the config ; - the "X-Forwarded-For:" header field (option forwardfor) Among them I hardly see why the request-id or XFF could cause any issue, but we could imagine that for any reason the server occasionally has trouble with "connection: close". If you have "option http-server-close" in your config, it would be nice to remove it and see if it changes anything. If you don't have it because the request already arrives with close (but then I hardly understand how it would work fine in TCP mode), then you could try to add "option http-pretend-keepalive" so that haproxy removes this header before passing the request to the server. If you could also share a capture of one request forwarded by Nginx to the server it would be nice. We may find a minor difference which could tell us what makes lighttpd behave badly here. Cheers, Willy
Re: Early connection close, incomplete transfers
On Fri, Feb 15, 2019 at 09:31:35AM +0100, Aleksandar Lazic wrote: > Am 15.02.2019 um 08:47 schrieb Veiko Kukk: > > On 2019-02-14 18:29, Aleksandar Lazic wrote: > >>> Replaced HAproxy with Nginx for testing and with Nginx, not a single > >>> connection > >>> was interrupted, did millions of requests. > >> > >> In 1.9.4 are a lot of fixed added. > >> please can you try your tests with 1.9.4, thanks. > > > > Already did before writing my previous letter. No differencies. > > Okay. Can you try to make some tcpdumps between haproxy and lighty, we need to > know what happens on the wire. Just for the record, a week ago Veiko sent me some traces that I still didn't have the time to look at. I'll try to have a look as time permits. Cheers, Willy
Re: Early connection close, incomplete transfers
Am 15.02.2019 um 08:47 schrieb Veiko Kukk: > On 2019-02-14 18:29, Aleksandar Lazic wrote: >>> Replaced HAproxy with Nginx for testing and with Nginx, not a single >>> connection >>> was interrupted, did millions of requests. >> >> In 1.9.4 are a lot of fixed added. >> please can you try your tests with 1.9.4, thanks. > > Already did before writing my previous letter. No differencies. Okay. Can you try to make some tcpdumps between haproxy and lighty, we need to know what happens on the wire. > Veiko Regards Aleks
Re: Early connection close, incomplete transfers
On 2019-02-14 18:29, Aleksandar Lazic wrote: Replaced HAproxy with Nginx for testing and with Nginx, not a single connection was interrupted, did millions of requests. In 1.9.4 are a lot of fixed added. please can you try your tests with 1.9.4, thanks. Already did before writing my previous letter. No differencies. Veiko
Re: Early connection close, incomplete transfers
Am 14.02.2019 um 15:31 schrieb Veiko Kukk: > > On 2019-02-01 13:30, Veiko Kukk wrote: >> On 2019-02-01 12:34, Aleksandar Lazic wrote: >> >>> Do you have any errors in lighthttpds log? >> >> Yes, it has error messages about not being enable to write to socket. >> >> Unrecoverable error writing to socket! errno 32, retries 12, ppoll >> return 1, send return -1 >> ERROR: Couldn't write header data to socket! desired: 4565 / actual: -1 >> >> I've tested with several hundred thoused requests, but it never >> happens when using "mode tcp". > > Replaced HAproxy with Nginx for testing and with Nginx, not a single > connection > was interrupted, did millions of requests. In 1.9.4 are a lot of fixed added. please can you try your tests with 1.9.4, thanks. > Veiko Regards aleks
Re: Early connection close, incomplete transfers
On 2019-02-01 13:30, Veiko Kukk wrote: On 2019-02-01 12:34, Aleksandar Lazic wrote: Do you have any errors in lighthttpds log? Yes, it has error messages about not being enable to write to socket. Unrecoverable error writing to socket! errno 32, retries 12, ppoll return 1, send return -1 ERROR: Couldn't write header data to socket! desired: 4565 / actual: -1 I've tested with several hundred thoused requests, but it never happens when using "mode tcp". Replaced HAproxy with Nginx for testing and with Nginx, not a single connection was interrupted, did millions of requests. Veiko
Re: Early connection close, incomplete transfers
Hi Veiko, On Mon, Feb 04, 2019 at 01:52:28PM +, Veiko Kukk wrote: > I'm sure it happens with all versions we have tried: 1.6, 1.7, 1.9 (did not > try 1.8, because we have never used it in production and decided to switch > directly to 1.9), but how could we make sure it's caused by something > different between versions if we observe very similar results. Since it's > happening at random, it's hard to judge if there is slight change in one or > another direction. Logs look same for all versions. OK, that's what I needed to know. > Only 'mode tcp' helps to get rid of those errors. > > > Do you know if the responses headers are properly delivered to the > > client when this happens ? And do they match what you expected ? Maybe > > the contents are sometimes invalid and the response rejected by haproxy, > > in which case a 502 would be returned to the client. When this happens, > > emitting "show errors" on the CLI will report it. > > I don't know, don't know about headers, don't have good tool to capture > headers for failed connections only. Any suggestions? In this case it's always the same, tcpdump... > echo "show errors" |/usr/bin/socat /var/run/haproxy.sock.stats1 stdio > Total events captured on [04/Feb/2019:13:46:33.167] : 0 Great so it's not an accidental protocol issue that causes this. > > Could you also check if this happens only/more with keep-alive, close or > > server-close ? > > I have seen no difference, unfortunately. OK. (...) > > - option keep-alive > > I assume you meant 'option http-keep-alive' because there is no 'option > keep-alive'. Indeed, but you corrected by yourself :-) > No difference, lot's of incomplete transfers. > > > I'm asking for 2.0-dev because it's where all known bugs are fixed. > > If none of these settings has any effect, we'll have to look at network > > traces I'm afraid. > > Would you like to have network traffic dump? Yes at this point I would really appreciate it (port 80, not 443). Even a single faulty session, one capture between the client and haproxy and the equivalent session between haproxy and the server. Please make sure these do match, it's very important. If your requests/responses contain a unique ID or a timestamp it should be OK. Alternatively we can add the source pour into a header, that eases lookups a lot. Thanks! Willy
Re: Early connection close, incomplete transfers
On 2019-02-01 17:02, Willy Tarreau wrote: Hi Veiko, Are you certain that 1.9 and 1.7 have the same issue ? I mean, you could be observing two different cases looking similar. If you're sure it's the same issue, it could rule out a number of parts that differ quite a lot between the two (idle conns etc). I'm sure it happens with all versions we have tried: 1.6, 1.7, 1.9 (did not try 1.8, because we have never used it in production and decided to switch directly to 1.9), but how could we make sure it's caused by something different between versions if we observe very similar results. Since it's happening at random, it's hard to judge if there is slight change in one or another direction. Logs look same for all versions. Only 'mode tcp' helps to get rid of those errors. Do you know if the responses headers are properly delivered to the client when this happens ? And do they match what you expected ? Maybe the contents are sometimes invalid and the response rejected by haproxy, in which case a 502 would be returned to the client. When this happens, emitting "show errors" on the CLI will report it. I don't know, don't know about headers, don't have good tool to capture headers for failed connections only. Any suggestions? echo "show errors" |/usr/bin/socat /var/run/haproxy.sock.stats1 stdio Total events captured on [04/Feb/2019:13:46:33.167] : 0 Could you also check if this happens only/more with keep-alive, close or server-close ? I have seen no difference, unfortunately. If you can run more tests in your test environment, I'd be interested in seeing how latest 2.0-dev works with these variants : Tested with http://www.haproxy.org/download/2.0/src/snapshot/haproxy-ss-20190204.tar.gz - http-reuse never No difference, lot's of incomplete transfers. - http-reuse always No difference, lot's of incomplete transfers. - option httpclose No difference, lot's of incomplete transfers. - option http-server-close No difference, lot's of incomplete transfers. - option keep-alive I assume you meant 'option http-keep-alive' because there is no 'option keep-alive'. No difference, lot's of incomplete transfers. I'm asking for 2.0-dev because it's where all known bugs are fixed. If none of these settings has any effect, we'll have to look at network traces I'm afraid. Would you like to have network traffic dump? Regards, Veiko
Re: Early connection close, incomplete transfers
Hi Veiko, On Thu, Jan 31, 2019 at 09:29:43AM +, Veiko Kukk wrote: > Connections are getting closed during data transfer phase at random sizes on > backend. Sometimes just as little as 420 bytes get transferred, but usually > more is transferred before sudden end of connection. HAproxy logs have > connection closing status SD-- when this happens. > > Basic components of system look like this: > Client --> HAproxy --> HTTP server --> Caching Proxy --> Remote origin > > Our HTTP server part is compiling data from chunks it gets from local cache. > When it receives request from client via HAproxy, it sends response header, > then fetches chunks, compiles those and sends data client. > > SD-- happens more frequently when connection between benchmarking tool and > HAproxy is fast, e.g. when doing tests where client side is not loaded much. > Happens much more for http than for https. I'm really not aware of such an issue. However, what's true is that responses which send only headers first and data later are rare, so it could be possible that this issue has been there for a while but noticed by almost nobody. Are you certain that 1.9 and 1.7 have the same issue ? I mean, you could be observing two different cases looking similar. If you're sure it's the same issue, it could rule out a number of parts that differ quite a lot between the two (idle conns etc). Do you know if the responses headers are properly delivered to the client when this happens ? And do they match what you expected ? Maybe the contents are sometimes invalid and the response rejected by haproxy, in which case a 502 would be returned to the client. When this happens, emitting "show errors" on the CLI will report it. (...) > Because of thread count differences, -t1 (one thread) test is much more > loaded on client side than it is with -t10 (ten threads). This could indicate a sensitivity to timing then : the more the request, the higher the load on the servers, the higher the response time and the more likely to kill before getting everything. > Random samples from HAproxy log (proper size of the object in HAproxy logs > is 667960042 bytes for that test file). > 0/0/0/0/903 200 270807819 - - SD-- 10/10/9/9/0 0/0 > 0/0/0/0/375 200 101926854 - - SD-- 10/10/9/9/0 0/0 > 0/0/0/0/725 200 243340623 - - SD-- 10/10/9/9/0 0/0 > 0/0/0/0/574 200 183069594 - - SD-- 11/11/9/9/0 0/0 > 0/0/0/0/648 200 208194175 - - SD-- 10/10/9/9/0 0/0 > 0/0/0/0/1130 200 270807819 - - SD-- 10/10/9/9/0 0/0 > 0/0/0/0/349 200 90597175 - - SD-- 10/10/9/9/0 0/0 So indeed these ones are not 502 but plain 200. Do you think the timings match the next data chunk sent by the server for example ? It's interesting to see that the response headers are really immediate here. If the timers had all been the same, we could have imagined that an earlier timer fired (e.g. the connect timer). > Test run directly against HTTP server do not return any errors in any > combination of test load, that's why we are suspecting that bug is in > HAproxy. Could you also check if this happens only/more with keep-alive, close or server-close ? > Could it be that HAproxy is somehow ending wrong connection or counts bytes > transferred improperly? What could happen is that the server sends malformed chunks that are rejected by the parser, or that it advertises chunked encoding but does not use it when sending the response. But that would be very strange as it should not depend on the load at all. If you can run more tests in your test environment, I'd be interested in seeing how latest 2.0-dev works with these variants : - http-reuse never - http-reuse always - option httpclose - option http-server-close - option keep-alive I'm asking for 2.0-dev because it's where all known bugs are fixed. If none of these settings has any effect, we'll have to look at network traces I'm afraid. Regards, Willy
Re: Early connection close, incomplete transfers
On 2019-02-01 12:34, Aleksandar Lazic wrote: Do you have any errors in lighthttpds log? Yes, it has error messages about not being enable to write to socket. Unrecoverable error writing to socket! errno 32, retries 12, ppoll return 1, send return -1 ERROR: Couldn't write header data to socket! desired: 4565 / actual: -1 I've tested with several hundred thoused requests, but it never happens when using "mode tcp". Regards, Veiko
Re: Early connection close, incomplete transfers
Hi. Do you have any errors in lighthttpds log? Regards Aleks Ursprüngliche Nachricht Von: Veiko Kukk Gesendet: 1. Februar 2019 12:33:39 MEZ An: Aleksandar Lazic CC: haproxy@formilux.org Betreff: Re: Early connection close, incomplete transfers On 2019-01-31 12:57, Aleksandar Lazic wrote: > Willy have found some issues which are added in the code of 2.0 tree. > Do you have a chance to test this branch or do you want to wait for > the next 1.9 release? I tested stable 1.9.3 and 1.9 preview version Willy gave link here https://www.mail-archive.com/haproxy@formilux.org/msg32678.html There is no difference in my tests. > I'm not sure if it affects you as we haven't seen the config yet. > Maybe you can share your config also so that we can see if your setup > could be effected. Commented timeouts are original timeouts, I had increased those to make sure, I'm not hitting any timeouts when creating higher load with tests. Maxconn values serve the same purpose. global log /dev/log local0 daemon nbproc 1 nbthread 16 maxconn user haproxy spread-checks 5 tune.ssl.default-dh-param 2048 ssl-default-bind-options no-sslv3 no-tls-tickets ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:!DSS ssl-default-server-options no-sslv3 no-tls-tickets ssl-default-server-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:!DSS tune.ssl.cachesize 10 tune.ssl.lifetime 1800 stats socket /var/run/haproxy.sock.stats1 mode 640 group vault process 1 level admin defaults log global mode http option httplog option contstats option log-health-checks retries 5 #timeout http-request 5s timeout http-request 99s #timeout http-keep-alive 20s timeout http-keep-alive 99s #timeout connect 10s timeout connect 99s #timeout client 30s timeout client 99s timeout server 120s #timeout client-fin 10s timeout client-fin 99s #timeout server-fin 10s timeout server-fin 99s listen main_frontend bind *:443 ssl crt /etc/vault/cert.pem crt /etc/letsencrypt/certs/ maxconn bind *:80 maxconn option forwardfor acl local_lighty_down nbsrv(lighty_load_balancer) lt 1 monitor-uri /load_balance_health monitor fail if local_lighty_down default_backend lighty_load_balancer backend lighty_load_balancer stats enable stats realm statistics http-response set-header Access-Control-Allow-Origin * option httpchk HEAD /dl/index.html server lighty0 127.0.0.1:9000 check maxconn fall 2 inter 15s rise 5 id 1 Test results httpress test output summary: 1 requests launched thread 3: 1000 connect, 1000 requests, 983 success, 17 fail, 6212668130 bytes, 449231 overhead thread 9: 996 connect, 996 requests, 979 success, 17 fail, 6187387690 bytes, 447403 overhead thread 4: 998 connect, 998 requests, 980 success, 18 fail, 6193707800 bytes, 447860 overhead thread 1: 1007 connect, 1007 requests, 988 success, 19 fail, 6244268680 bytes, 451516 overhead thread 8: 998 connect, 998 requests, 977 success, 21 fail, 6174747470 bytes, 446489 overhead thread 7: 1001 connect, 1001 requests, 970 success, 31 fail, 6130506700 bytes, 443290 overhead thread 10: 997 connect, 997 requests, 983 success, 14 fail, 6212668130 bytes, 449231 overhead thread 6: 1004 connect, 1004 requests, 986 success, 18 fail, 6231628460 bytes, 450602 overhead thread 5: 999 connect, 999 requests, 982 success, 17 fail, 6206348020 bytes, 448774 overhead thread 2: 1000 connect, 1000 requests, 981 success, 19 fail, 6200027910 bytes, 448317 overhead TOTALS: 1 connect, 1 requests, 9809 success, 191 fail, 100 (100) real concurrency TRAFFIC: 6320110 avg bytes, 457 avg overhead, 61993958990 bytes, 4482713 overhead TIMING: 81.014 seconds, 121 rps, 747335 kbps, 825.9 ms avg req time HAproxy log sections of incomplete transfers (6320535 bytes should be transferred with this test data set): 127.0.0.1:33054 [01/Feb/2019:11:22:48.178] main_frontend lighty_load_balancer/lighty0 0/0/0/0/298 200 425 - - SD-- 100/100/99/99/0 0/0 " 127.0.0.1:32820 [01/Feb/2019:11:22:48.068] main_frontend lighty_load_balancer/lighty0 0/0/0/0/409 200 4990 - - SD-- 99/99/98/98/0 0/0 " 127.0.0.1:34330 [01/Feb/2019:11:22:49.199] main_frontend lighty_load_balancer/lighty0 0/0/0/0/90 200 425 - - SD-- 100/100/99/99/0 0/0 " 127.0.0.1:34344 [01/Feb/2019:11:22:49.201] main_frontend lighty_load_balancer/lighty0 0/0/0/0/88 200 425 - - SD-- 99/99/98/98/0 0/0
Re: Early connection close, incomplete transfers
On 2019-01-31 12:57, Aleksandar Lazic wrote: Willy have found some issues which are added in the code of 2.0 tree. Do you have a chance to test this branch or do you want to wait for the next 1.9 release? I tested stable 1.9.3 and 1.9 preview version Willy gave link here https://www.mail-archive.com/haproxy@formilux.org/msg32678.html There is no difference in my tests. I'm not sure if it affects you as we haven't seen the config yet. Maybe you can share your config also so that we can see if your setup could be effected. Commented timeouts are original timeouts, I had increased those to make sure, I'm not hitting any timeouts when creating higher load with tests. Maxconn values serve the same purpose. global log /dev/log local0 daemon nbproc 1 nbthread 16 maxconn user haproxy spread-checks 5 tune.ssl.default-dh-param 2048 ssl-default-bind-options no-sslv3 no-tls-tickets ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:!DSS ssl-default-server-options no-sslv3 no-tls-tickets ssl-default-server-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:!DSS tune.ssl.cachesize 10 tune.ssl.lifetime 1800 stats socket /var/run/haproxy.sock.stats1 mode 640 group vault process 1 level admin defaults log global mode http option httplog option contstats option log-health-checks retries 5 #timeout http-request 5s timeout http-request 99s #timeout http-keep-alive 20s timeout http-keep-alive 99s #timeout connect 10s timeout connect 99s #timeout client 30s timeout client 99s timeout server 120s #timeout client-fin 10s timeout client-fin 99s #timeout server-fin 10s timeout server-fin 99s listen main_frontend bind *:443 ssl crt /etc/vault/cert.pem crt /etc/letsencrypt/certs/ maxconn bind *:80 maxconn option forwardfor acl local_lighty_down nbsrv(lighty_load_balancer) lt 1 monitor-uri /load_balance_health monitor fail if local_lighty_down default_backend lighty_load_balancer backend lighty_load_balancer stats enable stats realm statistics http-response set-header Access-Control-Allow-Origin * option httpchk HEAD /dl/index.html server lighty0 127.0.0.1:9000 check maxconn fall 2 inter 15s rise 5 id 1 Test results httpress test output summary: 1 requests launched thread 3: 1000 connect, 1000 requests, 983 success, 17 fail, 6212668130 bytes, 449231 overhead thread 9: 996 connect, 996 requests, 979 success, 17 fail, 6187387690 bytes, 447403 overhead thread 4: 998 connect, 998 requests, 980 success, 18 fail, 6193707800 bytes, 447860 overhead thread 1: 1007 connect, 1007 requests, 988 success, 19 fail, 6244268680 bytes, 451516 overhead thread 8: 998 connect, 998 requests, 977 success, 21 fail, 6174747470 bytes, 446489 overhead thread 7: 1001 connect, 1001 requests, 970 success, 31 fail, 6130506700 bytes, 443290 overhead thread 10: 997 connect, 997 requests, 983 success, 14 fail, 6212668130 bytes, 449231 overhead thread 6: 1004 connect, 1004 requests, 986 success, 18 fail, 6231628460 bytes, 450602 overhead thread 5: 999 connect, 999 requests, 982 success, 17 fail, 6206348020 bytes, 448774 overhead thread 2: 1000 connect, 1000 requests, 981 success, 19 fail, 6200027910 bytes, 448317 overhead TOTALS: 1 connect, 1 requests, 9809 success, 191 fail, 100 (100) real concurrency TRAFFIC: 6320110 avg bytes, 457 avg overhead, 61993958990 bytes, 4482713 overhead TIMING: 81.014 seconds, 121 rps, 747335 kbps, 825.9 ms avg req time HAproxy log sections of incomplete transfers (6320535 bytes should be transferred with this test data set): 127.0.0.1:33054 [01/Feb/2019:11:22:48.178] main_frontend lighty_load_balancer/lighty0 0/0/0/0/298 200 425 - - SD-- 100/100/99/99/0 0/0 " 127.0.0.1:32820 [01/Feb/2019:11:22:48.068] main_frontend lighty_load_balancer/lighty0 0/0/0/0/409 200 4990 - - SD-- 99/99/98/98/0 0/0 " 127.0.0.1:34330 [01/Feb/2019:11:22:49.199] main_frontend lighty_load_balancer/lighty0 0/0/0/0/90 200 425 - - SD-- 100/100/99/99/0 0/0 " 127.0.0.1:34344 [01/Feb/2019:11:22:49.201] main_frontend lighty_load_balancer/lighty0 0/0/0/0/88 200 425 - - SD-- 99/99/98/98/0 0/0 " 127.0.0.1:34658 [01/Feb/2019:11:22:49.447] main_frontend lighty_load_balancer/lighty0 0/0/0/0/254 200 425 - - SD-- 100/100/98/98/0 0/0 " 127.0.0.1:34386 [01/Feb/2019:11:22:49.290] main_frontend lighty_load_balancer/lighty0 0/0/0/0/412 200 425 - - SD-- 100/100/98/98/0 0/0 " 127.0.0.1:34388 [01/Feb/2019:11:22:49.290] main_frontend
Re: Early connection close, incomplete transfers
Hi. Am 31.01.2019 um 10:29 schrieb Veiko Kukk: > HAproxy 1.9.3, but happens also with 1.7.10, 1.7.11. > > Connections are getting closed during data transfer phase at random sizes on > backend. Sometimes just as little as 420 bytes get transferred, but usually > more is transferred before sudden end of connection. HAproxy logs have > connection closing status SD-- when this happens. Willy have found some issues which are added in the code of 2.0 tree. Do you have a chance to test this branch or do you want to wait for the next 1.9 release? I'm not sure if it affects you as we haven't seen the config yet. Maybe you can share your config also so that we can see if your setup could be effected. Best regards Aleks > Basic components of system look like this: > Client --> HAproxy --> HTTP server --> Caching Proxy --> Remote origin > > Our HTTP server part is compiling data from chunks it gets from local cache. > When it receives request from client via HAproxy, it sends response header, > then fetches chunks, compiles those and sends data client. > > SD-- happens more frequently when connection between benchmarking tool and > HAproxy is fast, e.g. when doing tests where client side is not loaded much. > Happens much more for http than for https. > > For example: > > httpress -t1 -c10 -n1000 URL (rarely or not at all) > 250 requests launched > 500 requests launched > 750 requests launched > 1000 requests launched > > TOTALS: 1000 connect, 1000 requests, 1000 success, 0 fail, 10 (10) real > concurrency > TRAFFIC: 667959622 avg bytes, 452 avg overhead, 667959622000 bytes, 452000 > overhead > TIMING: 241.023 seconds, 4 rps, 2706393 kbps, 2410.2 ms avg req time > > httpress -t10 -c10 -n1000 URL (happens frequently) > > 2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection closed > 2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection closed > 2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection closed > 2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection closed > 2019-01-31 08:44:17 [26361:0x7fdc91a23700]: body [0] read connection closed > 2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection closed > 2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection closed > 1000 requests launched > 2019-01-31 08:44:19 [26361:0x7fdc82ffd700]: body [0] read connection closed > thread 6: 73 connect, 73 requests, 72 success, 1 fail, 48093092784 bytes, > 32544 overhead > thread 10: 72 connect, 72 requests, 72 success, 0 fail, 48093092784 bytes, > 32544 overhead > thread 7: 73 connect, 73 requests, 72 success, 1 fail, 48093092784 bytes, > 32544 overhead > thread 4: 88 connect, 88 requests, 67 success, 21 fail, 44753294674 bytes, > 30284 overhead > thread 9: 111 connect, 111 requests, 56 success, 55 fail, 37405738832 bytes, > 25312 overhead > thread 5: 82 connect, 82 requests, 68 success, 14 fail, 45421254296 bytes, > 30736 overhead > thread 1: 86 connect, 86 requests, 68 success, 18 fail, 45421254296 bytes, > 30736 overhead > thread 8: 184 connect, 184 requests, 29 success, 155 fail, 19370829038 bytes, > 13108 overhead > thread 3: 73 connect, 73 requests, 73 success, 0 fail, 48761052406 bytes, > 32996 overhead > thread 2: 158 connect, 158 requests, 39 success, 119 fail, 26050425258 bytes, > 17628 overhead > > TOTALS: 1000 connect, 1000 requests, 616 success, 384 fail, 10 (10) real > concurrency > TRAFFIC: 667959622 avg bytes, 452 avg overhead, 411463127152 bytes, 278432 > overhead > TIMING: 170.990 seconds, 3 rps, 2349959 kbps, 2775.8 ms avg req time > > Because of thread count differences, -t1 (one thread) test is much more > loaded on client side than it is with -t10 (ten threads). > > Random samples from HAproxy log (proper size of the object in HAproxy logs is > 667960042 bytes for that test file). > 0/0/0/0/903 200 270807819 - - SD-- 10/10/9/9/0 0/0 > 0/0/0/0/375 200 101926854 - - SD-- 10/10/9/9/0 0/0 > 0/0/0/0/725 200 243340623 - - SD-- 10/10/9/9/0 0/0 > 0/0/0/0/574 200 183069594 - - SD-- 11/11/9/9/0 0/0 > 0/0/0/0/648 200 208194175 - - SD-- 10/10/9/9/0 0/0 > 0/0/0/0/1130 200 270807819 - - SD-- 10/10/9/9/0 0/0 > 0/0/0/0/349 200 90597175 - - SD-- 10/10/9/9/0 0/0 > > Our HTTP server logs contain hard unrecoverable errors about unable to write > to socket when HAproxy closes connection: > Return Code: 32. Transferred 79389313 out of 667959622 Bytes in 809 msec > Return Code: 32. Transferred 198965568 out of 667959622 Bytes in 986 msec > Return Code: 32. Transferred 126690257 out of 667959622 Bytes in 825 msec > Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1273 msec > Return Code: 32. Transferred 171663764 out of 667959622 Bytes in 1075 msec > Return Code: 32. Transferred 169362556 out of 667959622 Bytes in 1146 msec > Return Code: 32. Transferred 167789692 out of 667959622 Bytes in 937 msec > Return Code: 32. Transferred 199752000 out of 667959622 Bytes in 1110 msec >
Early connection close, incomplete transfers
HAproxy 1.9.3, but happens also with 1.7.10, 1.7.11. Connections are getting closed during data transfer phase at random sizes on backend. Sometimes just as little as 420 bytes get transferred, but usually more is transferred before sudden end of connection. HAproxy logs have connection closing status SD-- when this happens. Basic components of system look like this: Client --> HAproxy --> HTTP server --> Caching Proxy --> Remote origin Our HTTP server part is compiling data from chunks it gets from local cache. When it receives request from client via HAproxy, it sends response header, then fetches chunks, compiles those and sends data client. SD-- happens more frequently when connection between benchmarking tool and HAproxy is fast, e.g. when doing tests where client side is not loaded much. Happens much more for http than for https. For example: httpress -t1 -c10 -n1000 URL (rarely or not at all) 250 requests launched 500 requests launched 750 requests launched 1000 requests launched TOTALS: 1000 connect, 1000 requests, 1000 success, 0 fail, 10 (10) real concurrency TRAFFIC: 667959622 avg bytes, 452 avg overhead, 667959622000 bytes, 452000 overhead TIMING: 241.023 seconds, 4 rps, 2706393 kbps, 2410.2 ms avg req time httpress -t10 -c10 -n1000 URL (happens frequently) 2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:17 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection closed 1000 requests launched 2019-01-31 08:44:19 [26361:0x7fdc82ffd700]: body [0] read connection closed thread 6: 73 connect, 73 requests, 72 success, 1 fail, 48093092784 bytes, 32544 overhead thread 10: 72 connect, 72 requests, 72 success, 0 fail, 48093092784 bytes, 32544 overhead thread 7: 73 connect, 73 requests, 72 success, 1 fail, 48093092784 bytes, 32544 overhead thread 4: 88 connect, 88 requests, 67 success, 21 fail, 44753294674 bytes, 30284 overhead thread 9: 111 connect, 111 requests, 56 success, 55 fail, 37405738832 bytes, 25312 overhead thread 5: 82 connect, 82 requests, 68 success, 14 fail, 45421254296 bytes, 30736 overhead thread 1: 86 connect, 86 requests, 68 success, 18 fail, 45421254296 bytes, 30736 overhead thread 8: 184 connect, 184 requests, 29 success, 155 fail, 19370829038 bytes, 13108 overhead thread 3: 73 connect, 73 requests, 73 success, 0 fail, 48761052406 bytes, 32996 overhead thread 2: 158 connect, 158 requests, 39 success, 119 fail, 26050425258 bytes, 17628 overhead TOTALS: 1000 connect, 1000 requests, 616 success, 384 fail, 10 (10) real concurrency TRAFFIC: 667959622 avg bytes, 452 avg overhead, 411463127152 bytes, 278432 overhead TIMING: 170.990 seconds, 3 rps, 2349959 kbps, 2775.8 ms avg req time Because of thread count differences, -t1 (one thread) test is much more loaded on client side than it is with -t10 (ten threads). Random samples from HAproxy log (proper size of the object in HAproxy logs is 667960042 bytes for that test file). 0/0/0/0/903 200 270807819 - - SD-- 10/10/9/9/0 0/0 0/0/0/0/375 200 101926854 - - SD-- 10/10/9/9/0 0/0 0/0/0/0/725 200 243340623 - - SD-- 10/10/9/9/0 0/0 0/0/0/0/574 200 183069594 - - SD-- 11/11/9/9/0 0/0 0/0/0/0/648 200 208194175 - - SD-- 10/10/9/9/0 0/0 0/0/0/0/1130 200 270807819 - - SD-- 10/10/9/9/0 0/0 0/0/0/0/349 200 90597175 - - SD-- 10/10/9/9/0 0/0 Our HTTP server logs contain hard unrecoverable errors about unable to write to socket when HAproxy closes connection: Return Code: 32. Transferred 79389313 out of 667959622 Bytes in 809 msec Return Code: 32. Transferred 198965568 out of 667959622 Bytes in 986 msec Return Code: 32. Transferred 126690257 out of 667959622 Bytes in 825 msec Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1273 msec Return Code: 32. Transferred 171663764 out of 667959622 Bytes in 1075 msec Return Code: 32. Transferred 169362556 out of 667959622 Bytes in 1146 msec Return Code: 32. Transferred 167789692 out of 667959622 Bytes in 937 msec Return Code: 32. Transferred 199752000 out of 667959622 Bytes in 1110 msec Return Code: 32. Transferred 158793496 out of 667959622 Bytes in 979 msec Return Code: 32. Transferred 240394573 out of 667959622 Bytes in 1087 msec Return Code: 32. Transferred 139962654 out of 667959622 Bytes in 918 msec Return Code: 32. Transferred 155690998 out of 667959622 Bytes in 977 msec Return Code: 32. Transferred 240394573 out of 667959622 Bytes in 1079 msec Return Code: 32. Transferred 177068702 out of 667959622 Bytes in 1060 msec Return Code: 32. Transferred 119149343 out of 667959622 Bytes in 881 msec Return Code: 32.