Re: Early connection close, incomplete transfers

2019-02-20 Thread Veiko Kukk

On 2019-02-19 06:47, Willy Tarreau wrote:


This is interesting. As you observed in the trace you sent me, the
lighttpd server closes just after sending the response headers. This
indeed matches the "SD" log that aproxy emits. If it doesn't happen
in TCP mode nor with Nginx, it means that something haproxy modifies
in the request causes this effect on the server.


Hi

I'm sending answer from colleague who investigated this more thoroughly, 
especially from lighttpd side:


we've been debugging this a bit further and it does not look like the 
issue with the seemingly random incomplete HTTP responses would be due 
to any particular request headers at the HTTP layer. It rather looks 
like something at the TCP level (so specific to HTTP mode):


A first observation we made is that the frequency of these incomplete 
transfers increases when we add a delay at the backend server after 
sending the response headers and before sending the body data. We added 
a 100 ms delay there and then got a lot of interrupted transfers that 
had only received the response headers (= no delay) but 0 bytes of the 
body (= which was sent just after delay). So the frequency with which 
this happens appears to be proportional to latencies/stalls in the 
backend server sending the response data (some read timeout logic at 
haproxy??).


We debugged further and noticed that in all cases where transfers were 
incomplete our lighttpd backend server was receiving an EPOLLRDHUP event 
on the socket where it communicates with haproxy. So it appears as if 
haproxy is *sometimes* (apparently depending on some read latency/stall 
- see above) shutting down its socket with the backend for writing 
*before* the full response and body data has been received.


And this is also basically ok because the socket remains writeable for 
lighttpd and so it could still send down the rest of the response data. 
However, it looks like lighttpd is not expecting this kind of behavior 
from the client and is not correctly handling such a half-closed TCP 
session. There is code in lighttpd to handle such a EPOLLRDHUP event and 
half-closed TCP connection, but lighttpd then also checks the state of 
the TCP session with getsockopts and keeps the connection open *only* 
when the state is TCP_CLOSE_WAIT. In all other cases upon receiving the 
EPOLLRDHUP it actively changes the state of the connection to "ERROR" 
and then closes the connection:


https://github.com/lighttpd/lighttpd1.4/blob/master/src/connections.c#L908
https://github.com/lighttpd/lighttpd1.4/blob/master/src/fdevent.c#L995

We checked and every time we have a incomplete response lighttpd 
receives the EPOLLRDHUP event on the socket but the tcp state queried 
via getsockopts is always TCP_CLOSE (and not TCP_CLOSE_WAIT as lighttpd 
seems to expect). And because of this lighttpd then actively closes the 
half-closed connection also from its end (which likely is the cause of 
the TCP FIN sent by lighttpd as seen in the tcpdump).


When we remove this condition from lighttpd which marks the connection 
as errorness in case of EPOLLRDHUP and tcp state != TCP_CLOSE_WAIT, then 
the problem with the incomplete transfers disappears:


https://github.com/lighttpd/lighttpd1.4/blob/master/src/connections.c#L922

We do not understand why this is or what the correct reaction to the 
EPOLLRDHUP event should be. In particular, we do not understand why 
lighttpd performs this check for TCP_CLOSE_WAIT or why we always get a 
state of TCP_CLOSE when we receive this event but the socket still 
continues to be writeable (so does the TCP_CLOSE just indicate that one 
direction of the connection is closed??). Still, because this 
half-closing of the connection to the backed server appears to happen 
just pretty randomly and depending on latency/stalls of the backend 
server sending down the response data, we assume that this is not the 
intended behavior by haproxy (and so possibly indicates some bug in 
haproxy too).


We assume that the reason why direct requests to the backend server or 
requests proxied via Nginx did never fail is because in these cases 
there never occurs the EPOLLRDHUP event and there never are half-closed 
connections. However, we have not tested this (yet), so we did not 
re-test with Nginx to verify that then indeed lighttpd never sees a 
EPOLLRDHUP.


Any ideas or suggestions based on these findings what should be the 
proper solution to the problem?


Thank you.



Re: Early connection close, incomplete transfers

2019-02-18 Thread Willy Tarreau
Hi Veiko,

On Thu, Feb 14, 2019 at 02:31:42PM +, Veiko Kukk wrote:
> 
> On 2019-02-01 13:30, Veiko Kukk wrote:
> > On 2019-02-01 12:34, Aleksandar Lazic wrote:
> > 
> > > Do you have any errors in lighthttpds log?
> > 
> > Yes, it has error messages about not being enable to write to socket.
> > 
> > Unrecoverable error writing to socket! errno 32, retries 12, ppoll
> > return 1, send return -1
> > ERROR: Couldn't write header data to socket! desired: 4565 / actual: -1
> > 
> > I've tested with several hundred thoused requests, but it never
> > happens when using "mode tcp".
> 
> Replaced HAproxy with Nginx for testing and with Nginx, not a single
> connection was interrupted, did millions of requests.

This is interesting. As you observed in the trace you sent me, the
lighttpd server closes just after sending the response headers. This
indeed matches the "SD" log that aproxy emits. If it doesn't happen
in TCP mode nor with Nginx, it means that something haproxy modifies
in the request causes this effect on the server.

The request you isolated looks like this :

  GET /video/_redacted_ HTTP/1.1
  Host: 127.0.0.1
  X-Request-Id: XXX_redacted_XXX
  Connection: close
  X-Forwarded-For: 127.0.0.1

So here what haproxy has apparently modified is :
  - the X-Request-Id header field that was appended
  - the "Connection: close" header field because you likely have
"option http-server-close" in the config ;
  - the "X-Forwarded-For:" header field (option forwardfor)

Among them I hardly see why the request-id or XFF could cause any issue,
but we could imagine that for any reason the server occasionally has
trouble with "connection: close".

If you have "option http-server-close" in your config, it would be nice
to remove it and see if it changes anything. If you don't have it because
the request already arrives with close (but then I hardly understand how
it would work fine in TCP mode), then you could try to add
"option http-pretend-keepalive" so that haproxy removes this header
before passing the request to the server.

If you could also share a capture of one request forwarded by Nginx to
the server it would be nice. We may find a minor difference which could
tell us what makes lighttpd behave badly here.

Cheers,
Willy



Re: Early connection close, incomplete transfers

2019-02-18 Thread Willy Tarreau
On Fri, Feb 15, 2019 at 09:31:35AM +0100, Aleksandar Lazic wrote:
> Am 15.02.2019 um 08:47 schrieb Veiko Kukk:
> > On 2019-02-14 18:29, Aleksandar Lazic wrote:
> >>> Replaced HAproxy with Nginx for testing and with Nginx, not a single 
> >>> connection
> >>> was interrupted, did millions of requests.
> >>
> >> In 1.9.4 are a lot of fixed added.
> >> please can you try your tests with 1.9.4, thanks.
> > 
> > Already did before writing my previous letter. No differencies.
> 
> Okay. Can you try to make some tcpdumps between haproxy and lighty, we need to
> know what happens on the wire.

Just for the record, a week ago Veiko sent me some traces that I still
didn't have the time to look at. I'll try to have a look as time permits.

Cheers,
Willy



Re: Early connection close, incomplete transfers

2019-02-15 Thread Aleksandar Lazic
Am 15.02.2019 um 08:47 schrieb Veiko Kukk:
> On 2019-02-14 18:29, Aleksandar Lazic wrote:
>>> Replaced HAproxy with Nginx for testing and with Nginx, not a single 
>>> connection
>>> was interrupted, did millions of requests.
>>
>> In 1.9.4 are a lot of fixed added.
>> please can you try your tests with 1.9.4, thanks.
> 
> Already did before writing my previous letter. No differencies.

Okay. Can you try to make some tcpdumps between haproxy and lighty, we need to
know what happens on the wire.

> Veiko

Regards
Aleks



Re: Early connection close, incomplete transfers

2019-02-14 Thread Veiko Kukk

On 2019-02-14 18:29, Aleksandar Lazic wrote:
Replaced HAproxy with Nginx for testing and with Nginx, not a single 
connection

was interrupted, did millions of requests.


In 1.9.4 are a lot of fixed added.
please can you try your tests with 1.9.4, thanks.


Already did before writing my previous letter. No differencies.

Veiko



Re: Early connection close, incomplete transfers

2019-02-14 Thread Aleksandar Lazic
Am 14.02.2019 um 15:31 schrieb Veiko Kukk:
> 
> On 2019-02-01 13:30, Veiko Kukk wrote:
>> On 2019-02-01 12:34, Aleksandar Lazic wrote:
>>
>>> Do you have any errors in lighthttpds log?
>>
>> Yes, it has error messages about not being enable to write to socket.
>>
>> Unrecoverable error writing to socket! errno 32, retries 12, ppoll
>> return 1, send return -1
>> ERROR: Couldn't write header data to socket! desired: 4565 / actual: -1
>>
>> I've tested with several hundred thoused requests, but it never
>> happens when using "mode tcp".
> 
> Replaced HAproxy with Nginx for testing and with Nginx, not a single 
> connection
> was interrupted, did millions of requests.

In 1.9.4 are a lot of fixed added.
please can you try your tests with 1.9.4, thanks.

> Veiko
Regards
aleks



Re: Early connection close, incomplete transfers

2019-02-14 Thread Veiko Kukk



On 2019-02-01 13:30, Veiko Kukk wrote:

On 2019-02-01 12:34, Aleksandar Lazic wrote:


Do you have any errors in lighthttpds log?


Yes, it has error messages about not being enable to write to socket.

Unrecoverable error writing to socket! errno 32, retries 12, ppoll
return 1, send return -1
ERROR: Couldn't write header data to socket! desired: 4565 / actual: -1

I've tested with several hundred thoused requests, but it never
happens when using "mode tcp".


Replaced HAproxy with Nginx for testing and with Nginx, not a single 
connection was interrupted, did millions of requests.


Veiko



Re: Early connection close, incomplete transfers

2019-02-04 Thread Willy Tarreau
Hi Veiko,

On Mon, Feb 04, 2019 at 01:52:28PM +, Veiko Kukk wrote:
> I'm sure it happens with all versions we have tried: 1.6, 1.7, 1.9 (did not
> try 1.8, because we have never used it in production and decided to switch
> directly to 1.9), but how could we make sure it's caused by something
> different between versions if we observe very similar results. Since it's
> happening at random, it's hard to judge if there is slight change in one or
> another direction. Logs look same for all versions.

OK, that's what I needed to know.

> Only 'mode tcp' helps to get rid of those errors.
> 
> > Do you know if the responses headers are properly delivered to the
> > client when this happens ? And do they match what you expected ? Maybe
> > the contents are sometimes invalid and the response rejected by haproxy,
> > in which case a 502 would be returned to the client. When this happens,
> > emitting "show errors" on the CLI will report it.
> 
> I don't know, don't know about headers, don't have good tool to capture
> headers for failed connections only. Any suggestions?

In this case it's always the same, tcpdump...

> echo "show errors" |/usr/bin/socat /var/run/haproxy.sock.stats1 stdio
> Total events captured on [04/Feb/2019:13:46:33.167] : 0

Great so it's not an accidental protocol issue that causes this.

> > Could you also check if this happens only/more with keep-alive, close or
> > server-close ?
> 
> I have seen no difference, unfortunately.

OK.

(...)
> >   - option keep-alive
> 
> I assume you meant 'option http-keep-alive' because there is no 'option
> keep-alive'.

Indeed, but you corrected by yourself :-)

> No difference, lot's of incomplete transfers.
> 
> > I'm asking for 2.0-dev because it's where all known bugs are fixed.
> > If none of these settings has any effect, we'll have to look at network
> > traces I'm afraid.
> 
> Would you like to have network traffic dump?

Yes at this point I would really appreciate it (port 80, not 443). Even a
single faulty session, one capture between the client and haproxy and the
equivalent session between haproxy and the server. Please make sure these
do match, it's very important. If your requests/responses contain a unique
ID or a timestamp it should be OK. Alternatively we can add the source pour
into a header, that eases lookups a lot.

Thanks!
Willy



Re: Early connection close, incomplete transfers

2019-02-04 Thread Veiko Kukk

On 2019-02-01 17:02, Willy Tarreau wrote:

Hi Veiko,
Are you certain that 1.9 and 1.7 have the same issue ? I mean, you
could be observing two different cases looking similar. If you're
sure it's the same issue, it could rule out a number of parts that
differ quite a lot between the two (idle conns etc).


I'm sure it happens with all versions we have tried: 1.6, 1.7, 1.9 (did 
not try 1.8, because we have never used it in production and decided to 
switch directly to 1.9), but how could we make sure it's caused by 
something different between versions if we observe very similar results. 
Since it's happening at random, it's hard to judge if there is slight 
change in one or another direction. Logs look same for all versions.


Only 'mode tcp' helps to get rid of those errors.


Do you know if the responses headers are properly delivered to the
client when this happens ? And do they match what you expected ? Maybe
the contents are sometimes invalid and the response rejected by 
haproxy,

in which case a 502 would be returned to the client. When this happens,
emitting "show errors" on the CLI will report it.


I don't know, don't know about headers, don't have good tool to capture 
headers for failed connections only. Any suggestions?


echo "show errors" |/usr/bin/socat /var/run/haproxy.sock.stats1 stdio
Total events captured on [04/Feb/2019:13:46:33.167] : 0

Could you also check if this happens only/more with keep-alive, close 
or

server-close ?


I have seen no difference, unfortunately.

If you can run more tests in your test environment, I'd be interested 
in

seeing how latest 2.0-dev works with these variants :


Tested with 
http://www.haproxy.org/download/2.0/src/snapshot/haproxy-ss-20190204.tar.gz



  - http-reuse never


No difference, lot's of incomplete transfers.


  - http-reuse always


No difference, lot's of incomplete transfers.


  - option httpclose


No difference, lot's of incomplete transfers.


  - option http-server-close


No difference, lot's of incomplete transfers.


  - option keep-alive


I assume you meant 'option http-keep-alive' because there is no 'option 
keep-alive'.

No difference, lot's of incomplete transfers.


I'm asking for 2.0-dev because it's where all known bugs are fixed.
If none of these settings has any effect, we'll have to look at network
traces I'm afraid.


Would you like to have network traffic dump?

Regards,
Veiko



Re: Early connection close, incomplete transfers

2019-02-01 Thread Willy Tarreau
Hi Veiko,

On Thu, Jan 31, 2019 at 09:29:43AM +, Veiko Kukk wrote:
> Connections are getting closed during data transfer phase at random sizes on
> backend. Sometimes just as little as 420 bytes get transferred, but usually
> more is transferred before sudden end of connection. HAproxy logs have
> connection closing status SD-- when this happens.
> 
> Basic components of system look like this:
> Client --> HAproxy --> HTTP server --> Caching Proxy --> Remote origin
> 
> Our HTTP server part is compiling data from chunks it gets from local cache.
> When it receives request from client via HAproxy, it sends response header,
> then fetches chunks, compiles those and sends data client.
> 
> SD-- happens more frequently when connection between benchmarking tool and
> HAproxy is fast, e.g. when doing tests where client side is not loaded much.
> Happens much more for http than for https.

I'm really not aware of such an issue. However, what's true is that
responses which send only headers first and data later are rare, so
it could be possible that this issue has been there for a while but
noticed by almost nobody.

Are you certain that 1.9 and 1.7 have the same issue ? I mean, you
could be observing two different cases looking similar. If you're
sure it's the same issue, it could rule out a number of parts that
differ quite a lot between the two (idle conns etc).

Do you know if the responses headers are properly delivered to the
client when this happens ? And do they match what you expected ? Maybe
the contents are sometimes invalid and the response rejected by haproxy,
in which case a 502 would be returned to the client. When this happens,
emitting "show errors" on the CLI will report it.

(...)
> Because of thread count differences, -t1 (one thread) test is much more
> loaded on client side than it is with -t10 (ten threads).

This could indicate a sensitivity to timing then : the more the request,
the higher the load on the servers, the higher the response time and the
more likely to kill before getting everything.

> Random samples from HAproxy log (proper size of the object in HAproxy logs
> is 667960042 bytes for that test file).
> 0/0/0/0/903 200 270807819 - - SD-- 10/10/9/9/0 0/0
> 0/0/0/0/375 200 101926854 - - SD-- 10/10/9/9/0 0/0
> 0/0/0/0/725 200 243340623 - - SD-- 10/10/9/9/0 0/0
> 0/0/0/0/574 200 183069594 - - SD-- 11/11/9/9/0 0/0
> 0/0/0/0/648 200 208194175 - - SD-- 10/10/9/9/0 0/0
> 0/0/0/0/1130 200 270807819 - - SD-- 10/10/9/9/0 0/0
> 0/0/0/0/349 200 90597175 - - SD-- 10/10/9/9/0 0/0

So indeed these ones are not 502 but plain 200. Do you think the timings
match the next data chunk sent by the server for example ? It's interesting
to see that the response headers are really immediate here. If the timers
had all been the same, we could have imagined that an earlier timer fired
(e.g. the connect timer).

> Test run directly against HTTP server do not return any errors in any
> combination of test load, that's why we are suspecting that bug is in
> HAproxy.

Could you also check if this happens only/more with keep-alive, close or
server-close ?

> Could it be that HAproxy is somehow ending wrong connection or counts bytes
> transferred improperly?

What could happen is that the server sends malformed chunks that are
rejected by the parser, or that it advertises chunked encoding but does
not use it when sending the response. But that would be very strange as
it should not depend on the load at all.

If you can run more tests in your test environment, I'd be interested in
seeing how latest 2.0-dev works with these variants :
  - http-reuse never
  - http-reuse always

  - option httpclose
  - option http-server-close
  - option keep-alive

I'm asking for 2.0-dev because it's where all known bugs are fixed.
If none of these settings has any effect, we'll have to look at network
traces I'm afraid.

Regards,
Willy



Re: Early connection close, incomplete transfers

2019-02-01 Thread Veiko Kukk

On 2019-02-01 12:34, Aleksandar Lazic wrote:


Do you have any errors in lighthttpds log?


Yes, it has error messages about not being enable to write to socket.

Unrecoverable error writing to socket! errno 32, retries 12, ppoll 
return 1, send return -1

ERROR: Couldn't write header data to socket! desired: 4565 / actual: -1

I've tested with several hundred thoused requests, but it never happens 
when using "mode tcp".


Regards,
Veiko



Re: Early connection close, incomplete transfers

2019-02-01 Thread Aleksandar Lazic
Hi.

Do you have any errors in lighthttpds log?

Regards
Aleks


 Ursprüngliche Nachricht 
Von: Veiko Kukk 
Gesendet: 1. Februar 2019 12:33:39 MEZ
An: Aleksandar Lazic 
CC: haproxy@formilux.org
Betreff: Re: Early connection close, incomplete transfers


On 2019-01-31 12:57, Aleksandar Lazic wrote:
> Willy have found some issues which are added in the code of 2.0 tree.
> Do you have a chance to test this branch or do you want to wait for
> the next 1.9 release?

I tested stable 1.9.3 and 1.9 preview version Willy gave link here 
https://www.mail-archive.com/haproxy@formilux.org/msg32678.html
There is no difference in my tests.

> I'm not sure if it affects you as we haven't seen the config yet.
> Maybe you can share your config also so that we can see if your setup
> could be effected.

Commented timeouts are original timeouts, I had increased those to make 
sure, I'm not hitting any timeouts when creating higher load with tests. 
Maxconn values  serve the same purpose.

global
   log /dev/log local0
   daemon
   nbproc 1
   nbthread 16
   maxconn 
   user haproxy
   spread-checks 5
   tune.ssl.default-dh-param 2048
   ssl-default-bind-options no-sslv3 no-tls-tickets
   ssl-default-bind-ciphers 
ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:!DSS
   ssl-default-server-options no-sslv3 no-tls-tickets
   ssl-default-server-ciphers 
ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:!DSS
   tune.ssl.cachesize 10
   tune.ssl.lifetime 1800
   stats socket /var/run/haproxy.sock.stats1 mode 640 group vault process 
1 level admin

defaults
   log global
   mode http
   option httplog
   option contstats
   option log-health-checks
   retries 5
   #timeout http-request 5s
   timeout http-request 99s
   #timeout http-keep-alive 20s
   timeout http-keep-alive 99s
   #timeout connect 10s
   timeout connect 99s
   #timeout client 30s
   timeout client 99s
   timeout server 120s
   #timeout client-fin 10s
   timeout client-fin 99s
   #timeout server-fin 10s
   timeout server-fin 99s

listen main_frontend
   bind *:443 ssl crt /etc/vault/cert.pem crt /etc/letsencrypt/certs/ 
maxconn 
   bind *:80 maxconn 
   option forwardfor
   acl local_lighty_down nbsrv(lighty_load_balancer) lt 1
   monitor-uri /load_balance_health
   monitor fail if local_lighty_down
   default_backend lighty_load_balancer

backend lighty_load_balancer
   stats enable
   stats realm statistics
   http-response set-header Access-Control-Allow-Origin *
   option httpchk HEAD /dl/index.html
   server lighty0 127.0.0.1:9000 check maxconn  fall 2 inter 15s rise 
5 id 1

Test results

httpress test output summary:

1 requests launched
thread 3: 1000 connect, 1000 requests, 983 success, 17 fail, 6212668130 
bytes, 449231 overhead
thread 9: 996 connect, 996 requests, 979 success, 17 fail, 6187387690 
bytes, 447403 overhead
thread 4: 998 connect, 998 requests, 980 success, 18 fail, 6193707800 
bytes, 447860 overhead
thread 1: 1007 connect, 1007 requests, 988 success, 19 fail, 6244268680 
bytes, 451516 overhead
thread 8: 998 connect, 998 requests, 977 success, 21 fail, 6174747470 
bytes, 446489 overhead
thread 7: 1001 connect, 1001 requests, 970 success, 31 fail, 6130506700 
bytes, 443290 overhead
thread 10: 997 connect, 997 requests, 983 success, 14 fail, 6212668130 
bytes, 449231 overhead
thread 6: 1004 connect, 1004 requests, 986 success, 18 fail, 6231628460 
bytes, 450602 overhead
thread 5: 999 connect, 999 requests, 982 success, 17 fail, 6206348020 
bytes, 448774 overhead
thread 2: 1000 connect, 1000 requests, 981 success, 19 fail, 6200027910 
bytes, 448317 overhead

TOTALS:  1 connect, 1 requests, 9809 success, 191 fail, 100 
(100) real concurrency
TRAFFIC: 6320110 avg bytes, 457 avg overhead, 61993958990 bytes, 4482713 
overhead
TIMING:  81.014 seconds, 121 rps, 747335 kbps, 825.9 ms avg req time


HAproxy log sections of incomplete transfers (6320535 bytes should be 
transferred with this test data set):
  127.0.0.1:33054 [01/Feb/2019:11:22:48.178] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/298 200 425 - - SD-- 
100/100/99/99/0 0/0 "
  127.0.0.1:32820 [01/Feb/2019:11:22:48.068] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/409 200 4990 - - SD-- 99/99/98/98/0 
0/0 "
  127.0.0.1:34330 [01/Feb/2019:11:22:49.199] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/90 200 425 - - SD-- 100/100/99/99/0 
0/0 "
  127.0.0.1:34344 [01/Feb/2019:11:22:49.201] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/88 200 425 - - SD-- 99/99/98/98/0 
0/0

Re: Early connection close, incomplete transfers

2019-02-01 Thread Veiko Kukk



On 2019-01-31 12:57, Aleksandar Lazic wrote:

Willy have found some issues which are added in the code of 2.0 tree.
Do you have a chance to test this branch or do you want to wait for
the next 1.9 release?


I tested stable 1.9.3 and 1.9 preview version Willy gave link here 
https://www.mail-archive.com/haproxy@formilux.org/msg32678.html

There is no difference in my tests.


I'm not sure if it affects you as we haven't seen the config yet.
Maybe you can share your config also so that we can see if your setup
could be effected.


Commented timeouts are original timeouts, I had increased those to make 
sure, I'm not hitting any timeouts when creating higher load with tests. 
Maxconn values  serve the same purpose.


global
  log /dev/log local0
  daemon
  nbproc 1
  nbthread 16
  maxconn 
  user haproxy
  spread-checks 5
  tune.ssl.default-dh-param 2048
  ssl-default-bind-options no-sslv3 no-tls-tickets
  ssl-default-bind-ciphers 
ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:!DSS

  ssl-default-server-options no-sslv3 no-tls-tickets
  ssl-default-server-ciphers 
ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:!DSS

  tune.ssl.cachesize 10
  tune.ssl.lifetime 1800
  stats socket /var/run/haproxy.sock.stats1 mode 640 group vault process 
1 level admin


defaults
  log global
  mode http
  option httplog
  option contstats
  option log-health-checks
  retries 5
  #timeout http-request 5s
  timeout http-request 99s
  #timeout http-keep-alive 20s
  timeout http-keep-alive 99s
  #timeout connect 10s
  timeout connect 99s
  #timeout client 30s
  timeout client 99s
  timeout server 120s
  #timeout client-fin 10s
  timeout client-fin 99s
  #timeout server-fin 10s
  timeout server-fin 99s

listen main_frontend
  bind *:443 ssl crt /etc/vault/cert.pem crt /etc/letsencrypt/certs/ 
maxconn 

  bind *:80 maxconn 
  option forwardfor
  acl local_lighty_down nbsrv(lighty_load_balancer) lt 1
  monitor-uri /load_balance_health
  monitor fail if local_lighty_down
  default_backend lighty_load_balancer

backend lighty_load_balancer
  stats enable
  stats realm statistics
  http-response set-header Access-Control-Allow-Origin *
  option httpchk HEAD /dl/index.html
  server lighty0 127.0.0.1:9000 check maxconn  fall 2 inter 15s rise 
5 id 1


Test results

httpress test output summary:

1 requests launched
thread 3: 1000 connect, 1000 requests, 983 success, 17 fail, 6212668130 
bytes, 449231 overhead
thread 9: 996 connect, 996 requests, 979 success, 17 fail, 6187387690 
bytes, 447403 overhead
thread 4: 998 connect, 998 requests, 980 success, 18 fail, 6193707800 
bytes, 447860 overhead
thread 1: 1007 connect, 1007 requests, 988 success, 19 fail, 6244268680 
bytes, 451516 overhead
thread 8: 998 connect, 998 requests, 977 success, 21 fail, 6174747470 
bytes, 446489 overhead
thread 7: 1001 connect, 1001 requests, 970 success, 31 fail, 6130506700 
bytes, 443290 overhead
thread 10: 997 connect, 997 requests, 983 success, 14 fail, 6212668130 
bytes, 449231 overhead
thread 6: 1004 connect, 1004 requests, 986 success, 18 fail, 6231628460 
bytes, 450602 overhead
thread 5: 999 connect, 999 requests, 982 success, 17 fail, 6206348020 
bytes, 448774 overhead
thread 2: 1000 connect, 1000 requests, 981 success, 19 fail, 6200027910 
bytes, 448317 overhead


TOTALS:  1 connect, 1 requests, 9809 success, 191 fail, 100 
(100) real concurrency
TRAFFIC: 6320110 avg bytes, 457 avg overhead, 61993958990 bytes, 4482713 
overhead

TIMING:  81.014 seconds, 121 rps, 747335 kbps, 825.9 ms avg req time


HAproxy log sections of incomplete transfers (6320535 bytes should be 
transferred with this test data set):
 127.0.0.1:33054 [01/Feb/2019:11:22:48.178] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/298 200 425 - - SD-- 
100/100/99/99/0 0/0 "
 127.0.0.1:32820 [01/Feb/2019:11:22:48.068] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/409 200 4990 - - SD-- 99/99/98/98/0 
0/0 "
 127.0.0.1:34330 [01/Feb/2019:11:22:49.199] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/90 200 425 - - SD-- 100/100/99/99/0 
0/0 "
 127.0.0.1:34344 [01/Feb/2019:11:22:49.201] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/88 200 425 - - SD-- 99/99/98/98/0 
0/0 "
 127.0.0.1:34658 [01/Feb/2019:11:22:49.447] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/254 200 425 - - SD-- 
100/100/98/98/0 0/0 "
 127.0.0.1:34386 [01/Feb/2019:11:22:49.290] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/412 200 425 - - SD-- 
100/100/98/98/0 0/0 "
 127.0.0.1:34388 [01/Feb/2019:11:22:49.290] main_frontend 

Re: Early connection close, incomplete transfers

2019-01-31 Thread Aleksandar Lazic
Hi.

Am 31.01.2019 um 10:29 schrieb Veiko Kukk:
> HAproxy 1.9.3, but happens also with 1.7.10, 1.7.11.
> 
> Connections are getting closed during data transfer phase at random sizes on 
> backend. Sometimes just as little as 420 bytes get transferred, but usually 
> more is transferred before sudden end of connection. HAproxy logs have 
> connection closing status SD-- when this happens.

Willy have found some issues which are added in the code of 2.0 tree.
Do you have a chance to test this branch or do you want to wait for the next 
1.9 release?

I'm not sure if it affects you as we haven't seen the config yet.
Maybe you can share your config also so that we can see if your setup could be 
effected.

Best regards
Aleks

> Basic components of system look like this:
> Client --> HAproxy --> HTTP server --> Caching Proxy --> Remote origin
> 
> Our HTTP server part is compiling data from chunks it gets from local cache. 
> When it receives request from client via HAproxy, it sends response header, 
> then fetches chunks, compiles those and sends data client.
> 
> SD-- happens more frequently when connection between benchmarking tool and 
> HAproxy is fast, e.g. when doing tests where client side is not loaded much. 
> Happens much more for http than for https.
> 
> For example:
> 
> httpress -t1 -c10 -n1000 URL (rarely or not at all)
> 250 requests launched
> 500 requests launched
> 750 requests launched
> 1000 requests launched
> 
> TOTALS:  1000 connect, 1000 requests, 1000 success, 0 fail, 10 (10) real 
> concurrency
> TRAFFIC: 667959622 avg bytes, 452 avg overhead, 667959622000 bytes, 452000 
> overhead
> TIMING:  241.023 seconds, 4 rps, 2706393 kbps, 2410.2 ms avg req time
> 
> httpress -t10 -c10 -n1000 URL (happens frequently)
> 
> 2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection closed
> 2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection closed
> 2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection closed
> 2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection closed
> 2019-01-31 08:44:17 [26361:0x7fdc91a23700]: body [0] read connection closed
> 2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection closed
> 2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection closed
> 1000 requests launched
> 2019-01-31 08:44:19 [26361:0x7fdc82ffd700]: body [0] read connection closed
> thread 6: 73 connect, 73 requests, 72 success, 1 fail, 48093092784 bytes, 
> 32544 overhead
> thread 10: 72 connect, 72 requests, 72 success, 0 fail, 48093092784 bytes, 
> 32544 overhead
> thread 7: 73 connect, 73 requests, 72 success, 1 fail, 48093092784 bytes, 
> 32544 overhead
> thread 4: 88 connect, 88 requests, 67 success, 21 fail, 44753294674 bytes, 
> 30284 overhead
> thread 9: 111 connect, 111 requests, 56 success, 55 fail, 37405738832 bytes, 
> 25312 overhead
> thread 5: 82 connect, 82 requests, 68 success, 14 fail, 45421254296 bytes, 
> 30736 overhead
> thread 1: 86 connect, 86 requests, 68 success, 18 fail, 45421254296 bytes, 
> 30736 overhead
> thread 8: 184 connect, 184 requests, 29 success, 155 fail, 19370829038 bytes, 
> 13108 overhead
> thread 3: 73 connect, 73 requests, 73 success, 0 fail, 48761052406 bytes, 
> 32996 overhead
> thread 2: 158 connect, 158 requests, 39 success, 119 fail, 26050425258 bytes, 
> 17628 overhead
> 
> TOTALS:  1000 connect, 1000 requests, 616 success, 384 fail, 10 (10) real 
> concurrency
> TRAFFIC: 667959622 avg bytes, 452 avg overhead, 411463127152 bytes, 278432 
> overhead
> TIMING:  170.990 seconds, 3 rps, 2349959 kbps, 2775.8 ms avg req time
> 
> Because of thread count differences, -t1 (one thread) test is much more 
> loaded on client side than it is with -t10 (ten threads).
> 
> Random samples from HAproxy log (proper size of the object in HAproxy logs is 
> 667960042 bytes for that test file).
> 0/0/0/0/903 200 270807819 - - SD-- 10/10/9/9/0 0/0
> 0/0/0/0/375 200 101926854 - - SD-- 10/10/9/9/0 0/0
> 0/0/0/0/725 200 243340623 - - SD-- 10/10/9/9/0 0/0
> 0/0/0/0/574 200 183069594 - - SD-- 11/11/9/9/0 0/0
> 0/0/0/0/648 200 208194175 - - SD-- 10/10/9/9/0 0/0
> 0/0/0/0/1130 200 270807819 - - SD-- 10/10/9/9/0 0/0
> 0/0/0/0/349 200 90597175 - - SD-- 10/10/9/9/0 0/0
> 
> Our HTTP server logs contain hard unrecoverable errors about unable to write 
> to socket when HAproxy closes connection:
> Return Code: 32. Transferred 79389313 out of 667959622 Bytes in 809 msec
> Return Code: 32. Transferred 198965568 out of 667959622 Bytes in 986 msec
> Return Code: 32. Transferred 126690257 out of 667959622 Bytes in 825 msec
> Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1273 msec
> Return Code: 32. Transferred 171663764 out of 667959622 Bytes in 1075 msec
> Return Code: 32. Transferred 169362556 out of 667959622 Bytes in 1146 msec
> Return Code: 32. Transferred 167789692 out of 667959622 Bytes in 937 msec
> Return Code: 32. Transferred 199752000 out of 667959622 Bytes in 1110 msec
> 

Early connection close, incomplete transfers

2019-01-31 Thread Veiko Kukk

HAproxy 1.9.3, but happens also with 1.7.10, 1.7.11.

Connections are getting closed during data transfer phase at random 
sizes on backend. Sometimes just as little as 420 bytes get transferred, 
but usually more is transferred before sudden end of connection. HAproxy 
logs have connection closing status SD-- when this happens.


Basic components of system look like this:
Client --> HAproxy --> HTTP server --> Caching Proxy --> Remote origin

Our HTTP server part is compiling data from chunks it gets from local 
cache. When it receives request from client via HAproxy, it sends 
response header, then fetches chunks, compiles those and sends data 
client.


SD-- happens more frequently when connection between benchmarking tool 
and HAproxy is fast, e.g. when doing tests where client side is not 
loaded much. Happens much more for http than for https.


For example:

httpress -t1 -c10 -n1000 URL (rarely or not at all)
250 requests launched
500 requests launched
750 requests launched
1000 requests launched

TOTALS:  1000 connect, 1000 requests, 1000 success, 0 fail, 10 (10) real 
concurrency
TRAFFIC: 667959622 avg bytes, 452 avg overhead, 667959622000 bytes, 
452000 overhead

TIMING:  241.023 seconds, 4 rps, 2706393 kbps, 2410.2 ms avg req time

httpress -t10 -c10 -n1000 URL (happens frequently)

2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection 
closed
2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection 
closed
2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection 
closed
2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection 
closed
2019-01-31 08:44:17 [26361:0x7fdc91a23700]: body [0] read connection 
closed
2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection 
closed
2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection 
closed

1000 requests launched
2019-01-31 08:44:19 [26361:0x7fdc82ffd700]: body [0] read connection 
closed
thread 6: 73 connect, 73 requests, 72 success, 1 fail, 48093092784 
bytes, 32544 overhead
thread 10: 72 connect, 72 requests, 72 success, 0 fail, 48093092784 
bytes, 32544 overhead
thread 7: 73 connect, 73 requests, 72 success, 1 fail, 48093092784 
bytes, 32544 overhead
thread 4: 88 connect, 88 requests, 67 success, 21 fail, 44753294674 
bytes, 30284 overhead
thread 9: 111 connect, 111 requests, 56 success, 55 fail, 37405738832 
bytes, 25312 overhead
thread 5: 82 connect, 82 requests, 68 success, 14 fail, 45421254296 
bytes, 30736 overhead
thread 1: 86 connect, 86 requests, 68 success, 18 fail, 45421254296 
bytes, 30736 overhead
thread 8: 184 connect, 184 requests, 29 success, 155 fail, 19370829038 
bytes, 13108 overhead
thread 3: 73 connect, 73 requests, 73 success, 0 fail, 48761052406 
bytes, 32996 overhead
thread 2: 158 connect, 158 requests, 39 success, 119 fail, 26050425258 
bytes, 17628 overhead


TOTALS:  1000 connect, 1000 requests, 616 success, 384 fail, 10 (10) 
real concurrency
TRAFFIC: 667959622 avg bytes, 452 avg overhead, 411463127152 bytes, 
278432 overhead

TIMING:  170.990 seconds, 3 rps, 2349959 kbps, 2775.8 ms avg req time

Because of thread count differences, -t1 (one thread) test is much more 
loaded on client side than it is with -t10 (ten threads).


Random samples from HAproxy log (proper size of the object in HAproxy 
logs is 667960042 bytes for that test file).

0/0/0/0/903 200 270807819 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/375 200 101926854 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/725 200 243340623 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/574 200 183069594 - - SD-- 11/11/9/9/0 0/0
0/0/0/0/648 200 208194175 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/1130 200 270807819 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/349 200 90597175 - - SD-- 10/10/9/9/0 0/0

Our HTTP server logs contain hard unrecoverable errors about unable to 
write to socket when HAproxy closes connection:

Return Code: 32. Transferred 79389313 out of 667959622 Bytes in 809 msec
Return Code: 32. Transferred 198965568 out of 667959622 Bytes in 986 
msec
Return Code: 32. Transferred 126690257 out of 667959622 Bytes in 825 
msec
Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1273 
msec
Return Code: 32. Transferred 171663764 out of 667959622 Bytes in 1075 
msec
Return Code: 32. Transferred 169362556 out of 667959622 Bytes in 1146 
msec
Return Code: 32. Transferred 167789692 out of 667959622 Bytes in 937 
msec
Return Code: 32. Transferred 199752000 out of 667959622 Bytes in 1110 
msec
Return Code: 32. Transferred 158793496 out of 667959622 Bytes in 979 
msec
Return Code: 32. Transferred 240394573 out of 667959622 Bytes in 1087 
msec
Return Code: 32. Transferred 139962654 out of 667959622 Bytes in 918 
msec
Return Code: 32. Transferred 155690998 out of 667959622 Bytes in 977 
msec
Return Code: 32. Transferred 240394573 out of 667959622 Bytes in 1079 
msec
Return Code: 32. Transferred 177068702 out of 667959622 Bytes in 1060 
msec
Return Code: 32. Transferred 119149343 out of 667959622 Bytes in 881 
msec
Return Code: 32.