Re: HAProxy does not write 504 on keep-alive connections

2015-11-13 Thread Laurent Senta
Done!

Thanks a lot Willy,

On Thu, Nov 12, 2015 at 8:26 PM Willy Tarreau  wrote:

> Hi Laurent,
>
> On Thu, Nov 12, 2015 at 06:44:42PM +, Laurent Senta wrote:
> > >
> > >
> > > > I traced back that change to:
> > > >
> > >
> http://git.haproxy.org/?p=haproxy-1.6.git;a=commit;h=6b726adb35d998eb55671c0d98ef889cb9fd64ab
> > > >
> > > > I don't understand why it's saner to kill the connection and hide
> the 504
> > > > instead of clearly stating the error and let the application handle
> the
> > > > timeout.
> > >
> > > As explained above, it's because a keep-alive enabled client must
> implement
> > > the ability to replay requests for which it didn't get a response
> because
> > > the connection died. In fact we're forwarding to the client what we
> saw on
> > > the server side so that the client can take the correct decision. If
> your
> > > client was directly connected to the server, it would have seen the
> exact
> > > same behaviour.
> > >
> >
> > That's where it gets confusing. HAProxy treats a server timeout on the
> > first request and the nth request differently and I don't see why.
> >
> > There are no semantic differences between a timeout on the first or on
> the
> > second request of the keep-alive connection. If I see a 504 when it's the
> > first request, why don't I see a 504 when it's the second?
>
> That's what I said in the thread before the paragraph you quoted above, I
> do
> think that the 504 should always be sent because I don't see how it could
> be related to a keep-alive connection failure, except in environments
> where keep-alive timeout is configured longer than firewall timeouts, but
> even then you'd prefer to log it so that it can be fixed.
>
> However all other cases (aborted connections with no data) must remain
> silent by default.
>
> I'm willing to change this and even to backport the change into 1.5 as I
> admit it doesn't provide value and only hides real trouble. I asked if
> people were OK with my proposal but instead I got comments about why we
> are doing it like this right now :-/
>
> > It's not related to RFCs, I know they say almost nothing on keep-alive
> > connection getting closed.
>
> The retry rules are explained in 6.3.1 of RFC7230. There's not much about
> this but it's mostly based on common sense once you have ingurgited all
> the stuff around this section so there's not much more than needs be said.
>
> > I'm not advocating having this as the default behavior (even though I
> think
> > the current default is wrong and it should be documented), but at least
> > having this as an option, or include it in http-server-close if it even
> > makes sense.
>
> For the 504 I'm definitely willing to fix the default behaviour. For the
> other ones, we definitely cannot change it without an option. And I really
> guess this option will only result in misuses or will never be used. So
> let's do it for 504 only.
>
> Do you want to submit a patch for this with a detailed enough description
> so that if someone suffers from the behaviour change we have all the
> context ?
>
> Thanks,
> Willy
>
>


Re: HAProxy does not write 504 on keep-alive connections

2015-11-12 Thread Laurent Senta
>
>
> > I traced back that change to:
> >
> http://git.haproxy.org/?p=haproxy-1.6.git;a=commit;h=6b726adb35d998eb55671c0d98ef889cb9fd64ab
> >
> > I don't understand why it's saner to kill the connection and hide the 504
> > instead of clearly stating the error and let the application handle the
> > timeout.
>
> As explained above, it's because a keep-alive enabled client must implement
> the ability to replay requests for which it didn't get a response because
> the connection died. In fact we're forwarding to the client what we saw on
> the server side so that the client can take the correct decision. If your
> client was directly connected to the server, it would have seen the exact
> same behaviour.
>

That's where it gets confusing. HAProxy treats a server timeout on the
first request and the nth request differently and I don't see why.

There are no semantic differences between a timeout on the first or on the
second request of the keep-alive connection. If I see a 504 when it's the
first request, why don't I see a 504 when it's the second?

>From your explanation, I understand that HAProxy closes the connection
abruptly only to force the client to retry which is either unsafe or not
possible in most cases.
The only solution for the client is to deal with the consequences of a
"remote disconnected" error, which is false and has much less information
than a 504 timeout.

What's even worse is that with `http-server-close`, so that HAProxy uses a
brand new connection with the server, it'll still kills the client
connection instead of "forwarding" what it sees as a timeout.

If my client was directly connected to the server it would have hanged. I
configure haproxy timeout to avoid that and trigger a timeout.


It's not related to RFCs, I know they say almost nothing on keep-alive
connection getting closed.


I'm not advocating having this as the default behavior (even though I think
the current default is wrong and it should be documented), but at least
having this as an option, or include it in http-server-close if it even
makes sense.

Bests


Re: HAProxy does not write 504 on keep-alive connections

2015-11-11 Thread Holger Just
Hi,

Willy Tarreau wrote:
> As explained above, it's because a keep-alive enabled client must implement
> the ability to replay requests for which it didn't get a response because
> the connection died. In fact we're forwarding to the client what we saw on
> the server side so that the client can take the correct decision. If your
> client was directly connected to the server, it would have seen the exact
> same behaviour.

One of the problems we saw when trying to reproduce this issue was that
some clients I tried (i.e. curl 7.45.0 and ruby's httpclient gem)
silently replayed any requests for which they didn't receive an answer.

This can result duplicated POSTs on the backend servers. Often, servers
continue to handle the first POST, even after HAProxy closed the backend
connection because its server timeout stroke. Now, the client just
replays the POST again resulting in potentially fatal behavior.

If I understand the HTTP specs correctly, this replay is correct from
the clients perspective as they can't know if they are speaking to a
loadbalancer or an origin server directly.

As a loadbalancer however, HAProxy should always return a proper HTTP
error if the request was even partially forwarded to the server. It's
probably fine to just close the connection if the connect timeout stroke
and the request was never actually handled anywhere but it should
definitely return a real HTTP error if its the sever timeout and a
backend server started doing anything with a request.

You could probably argue to differentiate between safe and unsafe
methods and to also just close for safe ones but is probably even more
confusing and has the potential for subtle bugs.

Best,
Holger



Re: HAProxy does not write 504 on keep-alive connections

2015-11-11 Thread Bryan Talbot
On Wed, Nov 11, 2015 at 6:47 AM, Holger Just  wrote:

>
> As a loadbalancer however, HAProxy should always return a proper HTTP
> error if the request was even partially forwarded to the server. It's
> probably fine to just close the connection if the connect timeout stroke
> and the request was never actually handled anywhere but it should
> definitely return a real HTTP error if its the sever timeout and a
> backend server started doing anything with a request.
>
>
This would be my preferred behavior and actually what I thought haproxy was
already doing.

-Bryan


Re: HAProxy does not write 504 on keep-alive connections

2015-11-11 Thread Willy Tarreau
On Wed, Nov 11, 2015 at 06:55:11PM -0800, Bryan Talbot wrote:
> On Wed, Nov 11, 2015 at 6:47 AM, Holger Just  wrote:
> 
> >
> > As a loadbalancer however, HAProxy should always return a proper HTTP
> > error if the request was even partially forwarded to the server. It's
> > probably fine to just close the connection if the connect timeout stroke
> > and the request was never actually handled anywhere but it should
> > definitely return a real HTTP error if its the sever timeout and a
> > backend server started doing anything with a request.
> >
> >
> This would be my preferred behavior and actually what I thought haproxy was
> already doing.

Guys, please read the HTTP RFC, you *can't* do that by default. HTTP/1 doesn't
warn before closing an idle keep-alive connection. So if you send a request
over an existing connection and the server closes at the same time, you get
exactly the situation above. And you clearly don't want to send a 502 or 504
to a client because it will be displayed on the browser. Remember the issues
we had with Chrome's preconnect and 408 ? That would be the same. We had to
silence the 408 on keep-alive connections to the client so that the browser
could replay. Here it's the same, by silently closing, we're telling the
browser it should retry, and we're ensuring not to interfere between the
browser and the server regarding the connection's behaviour.

And it's the browser's responsibility to only retry safe requests (those
that are called "idempotent"). Normally it does this by selecting which
requests can be sent over an existing connection. That ensures a non-
idempotent request cannot hit a closed connection. In web services
environments, in order to address this, you often see the requests sent
in two parts, first a POST is emitted with an Expect: 100-continue, and
only once the server responds, the body is emitted (which ensures that
the connection is still alive). Note by the way that it requires two
round trips to do this so there's little benefit to keeping persistent
connections in such a case.

You have the exact same situation between the browser and haproxy, or the
browser and the server. When haproxy or the server closes a keep-alive
connection, the browser doesn't know whether the request was being processed
or not, and that's the reason why it (normally) doesn't send unsafe request
over existing connections.

I'm not opposed to having an option to make these errors verbose for web
services environments, but be prepared to hear end-users complain if you
enable this with browsers-facing web sites, because your users will
unexpectedly get some 502 or 504.

Yes the persistent connection model in HTTP/1 is far from being perfect,
and that's one of the reasons it was changed in HTTP/2.

Willy




Re: HAProxy does not write 504 on keep-alive connections

2015-11-11 Thread Willy Tarreau
Hi,

On Wed, Nov 11, 2015 at 03:46:38AM +, Laurent Senta wrote:
> Thanks for the reply guys,
> after investigating the source code, it looks like this behavior is wanted.

Yes, it is mandated by the way keep-alive works in HTTP. You never know when
the connection will abort, and at the moment you reuse a connection, it may
fail on you. People don't want to see in their browser errors that are not
real errors and just a result of normal traffic, instead the browser knows
it must retry because that situation is expected. But it will not retry if
it receives a valid response (and an error is a valid response).

> I've been able to "fix it" by removing these two lines:
> 
> diff --git a/src/proto_http.c b/src/proto_http.c
> index 2dcac06..d33b4a1 100644
> --- a/src/proto_http.c
> +++ b/src/proto_http.c
> @@ -6125,8 +6125,6 @@ int http_wait_for_response(struct stream *s, struct
> channel *rep, int an_bit)
> else if (rep->flags & CF_READ_TIMEOUT) {
> if (msg->err_pos >= 0)
> 
> http_capture_bad_message(>be->invalid_rep, s, msg, msg->msg_state,
> sess->fe);
> -   else if (txn->flags & TX_NOT_FIRST)
> -   goto abort_keep_alive;

I would argue that we could possibly relax this for timeouts indeed.
Someone who configures haproxy's keep-alive timeout to a value lower
than the surrounding firewalls' timeouts is seeking trouble. And the
bad was already done by making the client wait. So probably we'd
rather report the 504 here.

Would it be enough for everyone if we just removed this one or do we
need something more configurable like Tait's patch ? I think we could
have something with verbosity levels :
  - act the most transparently possible regarding our local issues (for
browsers), which means replicate on the client side what we see on
the server side (eg: close when we get a close).
  - maybe report errors in logs but still silently close the connection
  - only report suspicious errors (eg: 504 here) to the client
  - report them all (eg: for webservices where this helps detect a
failing server)

Before rushing on a patch we should also consider this with the http-reuse
that was introduced in 1.6, to ensure we don't end up with something ugly.

> I traced back that change to:
> http://git.haproxy.org/?p=haproxy-1.6.git;a=commit;h=6b726adb35d998eb55671c0d98ef889cb9fd64ab
> 
> I don't understand why it's saner to kill the connection and hide the 504
> instead of clearly stating the error and let the application handle the
> timeout.

As explained above, it's because a keep-alive enabled client must implement
the ability to replay requests for which it didn't get a response because
the connection died. In fact we're forwarding to the client what we saw on
the server side so that the client can take the correct decision. If your
client was directly connected to the server, it would have seen the exact
same behaviour.

Regards,
Willy




Re: HAProxy does not write 504 on keep-alive connections

2015-11-10 Thread Bryan Talbot
On Tue, Nov 10, 2015 at 12:04 PM, Laurent Senta 
wrote:

> Hi there,
> I think there's a bug in how HAProxy handles timeout, that'd be great if
> you can confirm or help me figure out what I do wrong:
>
> Basically: if a server timeout happens on a keep-alive connection, haproxy
> does not write a 504 response before closing the socket.
>

I am able to reproduce this behavior too. I tried a few versions of haproxy
all the way back to 1.4.0 and they all did this.

The general way to reproduce is:
1) open tcp connection
2) make request that completes before 'timeout server' / 'timeout client'
3) make request that does NOT complete before 'timeout server' on same tcp
connection

The second request gets no response and the connection is just closed. If
the same request (#3) is made on a new tcp connection that did not have a
previously successful response, the response is a 504 and not the silently
closed connection.

The haproxy.cfg I'm using is

global
maxconn 4096

defaults
mode http
timeout connect 5s
timeout client 450ms
timeout server 450ms

listen http
bind :8000
server slowserv 127.0.0.1:8002


And 'slowserv' simply sleeps for the amount of time requested through a
query string parameter.

Note that curl and httpclient it seems is not good to test in this
situation because both do retry the request that receives an empty response
-- even if that request was a POST.

I'm able to see the successful and empty-responses using wget like this by
forcing wget to not retry (only try once). In these cases, the backend will
take 0.4 seconds and 0.5 seconds to respond (respectively).  The timeout
server is configured to strike at 0.45 seconds.

$> wget --tries 1 -O /dev/null "http://127.0.0.1:8000/?0.4; "
http://127.0.0.1:8000/?0.5;





>
> This leads python to fail with a serious BadStatusLineError instead of a
> simple http error.
> And Ruby to retry potentially non-idempotent methods.
>
> Here's a basic setup to reproduce the error:
> https://gist.github.com/lsenta/1d33c6a01c07b32ac18a
>
> I've also had some help by meineerde on irc, here's the haproxy logs with
> a ruby client doing the same request:
> https://gist.github.com/meineerde/87a571c57369d322dae0#gistcomment-1617687
>
>

In case there is some confusion, the comment about "6 gets instead of 5" is
due to the ruby httpclient or curl retrying on that end and is not being
re-tried by haproxy.



> I've seen this behavior with v.1.6.2, 1.5.15 and 1.6.0
>


and several 1.5.x, 1.4.26, 1.4.6, and 1.4.0 all the same.

-Bryan


Re: HAProxy does not write 504 on keep-alive connections

2015-11-10 Thread Tait Clarridge
On Tue, Nov 10, 2015 at 7:32 PM, Bryan Talbot  wrote:
> On Tue, Nov 10, 2015 at 12:04 PM, Laurent Senta 
> wrote:
>>
>> Hi there,
>> I think there's a bug in how HAProxy handles timeout, that'd be great if
>> you can confirm or help me figure out what I do wrong:
>>
>> Basically: if a server timeout happens on a keep-alive connection, haproxy
>> does not write a 504 response before closing the socket.
>
>

I believe this is the same behaviour that I submitted a patch (without
a lot of context in the description) for in
https://www.marc.info/?l=haproxy=141822428718492=4

We have been running a patched version of 1.5.x with this applied in
our environment for almost a year, which although it has a very
specific use case and does not touch all the features of the project,
has had zero issues with leaking sockets/memory (or segfaults).



Re: HAProxy does not write 504 on keep-alive connections

2015-11-10 Thread Laurent Senta
Thanks for the reply guys,
after investigating the source code, it looks like this behavior is wanted.

I've been able to "fix it" by removing these two lines:

diff --git a/src/proto_http.c b/src/proto_http.c
index 2dcac06..d33b4a1 100644
--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -6125,8 +6125,6 @@ int http_wait_for_response(struct stream *s, struct
channel *rep, int an_bit)
else if (rep->flags & CF_READ_TIMEOUT) {
if (msg->err_pos >= 0)

http_capture_bad_message(>be->invalid_rep, s, msg, msg->msg_state,
sess->fe);
-   else if (txn->flags & TX_NOT_FIRST)
-   goto abort_keep_alive;

s->be->be_counters.failed_resp++;
if (objt_server(s->target)) {

I traced back that change to:
http://git.haproxy.org/?p=haproxy-1.6.git;a=commit;h=6b726adb35d998eb55671c0d98ef889cb9fd64ab

I don't understand why it's saner to kill the connection and hide the 504
instead of clearly stating the error and let the application handle the
timeout.



On Wed, Nov 11, 2015 at 3:11 AM Tait Clarridge  wrote:

> On Tue, Nov 10, 2015 at 7:32 PM, Bryan Talbot 
> wrote:
> > On Tue, Nov 10, 2015 at 12:04 PM, Laurent Senta  >
> > wrote:
> >>
> >> Hi there,
> >> I think there's a bug in how HAProxy handles timeout, that'd be great if
> >> you can confirm or help me figure out what I do wrong:
> >>
> >> Basically: if a server timeout happens on a keep-alive connection,
> haproxy
> >> does not write a 504 response before closing the socket.
> >
> >
>
> I believe this is the same behaviour that I submitted a patch (without
> a lot of context in the description) for in
> https://www.marc.info/?l=haproxy=141822428718492=4
>
> We have been running a patched version of 1.5.x with this applied in
> our environment for almost a year, which although it has a very
> specific use case and does not touch all the features of the project,
> has had zero issues with leaking sockets/memory (or segfaults).
>
>