Re: HAProxy does not write 504 on keep-alive connections
Done! Thanks a lot Willy, On Thu, Nov 12, 2015 at 8:26 PM Willy Tarreau wrote: > Hi Laurent, > > On Thu, Nov 12, 2015 at 06:44:42PM +, Laurent Senta wrote: > > > > > > > > > > I traced back that change to: > > > > > > > > http://git.haproxy.org/?p=haproxy-1.6.git;a=commit;h=6b726adb35d998eb55671c0d98ef889cb9fd64ab > > > > > > > > I don't understand why it's saner to kill the connection and hide > the 504 > > > > instead of clearly stating the error and let the application handle > the > > > > timeout. > > > > > > As explained above, it's because a keep-alive enabled client must > implement > > > the ability to replay requests for which it didn't get a response > because > > > the connection died. In fact we're forwarding to the client what we > saw on > > > the server side so that the client can take the correct decision. If > your > > > client was directly connected to the server, it would have seen the > exact > > > same behaviour. > > > > > > > That's where it gets confusing. HAProxy treats a server timeout on the > > first request and the nth request differently and I don't see why. > > > > There are no semantic differences between a timeout on the first or on > the > > second request of the keep-alive connection. If I see a 504 when it's the > > first request, why don't I see a 504 when it's the second? > > That's what I said in the thread before the paragraph you quoted above, I > do > think that the 504 should always be sent because I don't see how it could > be related to a keep-alive connection failure, except in environments > where keep-alive timeout is configured longer than firewall timeouts, but > even then you'd prefer to log it so that it can be fixed. > > However all other cases (aborted connections with no data) must remain > silent by default. > > I'm willing to change this and even to backport the change into 1.5 as I > admit it doesn't provide value and only hides real trouble. I asked if > people were OK with my proposal but instead I got comments about why we > are doing it like this right now :-/ > > > It's not related to RFCs, I know they say almost nothing on keep-alive > > connection getting closed. > > The retry rules are explained in 6.3.1 of RFC7230. There's not much about > this but it's mostly based on common sense once you have ingurgited all > the stuff around this section so there's not much more than needs be said. > > > I'm not advocating having this as the default behavior (even though I > think > > the current default is wrong and it should be documented), but at least > > having this as an option, or include it in http-server-close if it even > > makes sense. > > For the 504 I'm definitely willing to fix the default behaviour. For the > other ones, we definitely cannot change it without an option. And I really > guess this option will only result in misuses or will never be used. So > let's do it for 504 only. > > Do you want to submit a patch for this with a detailed enough description > so that if someone suffers from the behaviour change we have all the > context ? > > Thanks, > Willy > >
Re: HAProxy does not write 504 on keep-alive connections
Hi Laurent, On Thu, Nov 12, 2015 at 06:44:42PM +, Laurent Senta wrote: > > > > > > > I traced back that change to: > > > > > http://git.haproxy.org/?p=haproxy-1.6.git;a=commit;h=6b726adb35d998eb55671c0d98ef889cb9fd64ab > > > > > > I don't understand why it's saner to kill the connection and hide the 504 > > > instead of clearly stating the error and let the application handle the > > > timeout. > > > > As explained above, it's because a keep-alive enabled client must implement > > the ability to replay requests for which it didn't get a response because > > the connection died. In fact we're forwarding to the client what we saw on > > the server side so that the client can take the correct decision. If your > > client was directly connected to the server, it would have seen the exact > > same behaviour. > > > > That's where it gets confusing. HAProxy treats a server timeout on the > first request and the nth request differently and I don't see why. > > There are no semantic differences between a timeout on the first or on the > second request of the keep-alive connection. If I see a 504 when it's the > first request, why don't I see a 504 when it's the second? That's what I said in the thread before the paragraph you quoted above, I do think that the 504 should always be sent because I don't see how it could be related to a keep-alive connection failure, except in environments where keep-alive timeout is configured longer than firewall timeouts, but even then you'd prefer to log it so that it can be fixed. However all other cases (aborted connections with no data) must remain silent by default. I'm willing to change this and even to backport the change into 1.5 as I admit it doesn't provide value and only hides real trouble. I asked if people were OK with my proposal but instead I got comments about why we are doing it like this right now :-/ > It's not related to RFCs, I know they say almost nothing on keep-alive > connection getting closed. The retry rules are explained in 6.3.1 of RFC7230. There's not much about this but it's mostly based on common sense once you have ingurgited all the stuff around this section so there's not much more than needs be said. > I'm not advocating having this as the default behavior (even though I think > the current default is wrong and it should be documented), but at least > having this as an option, or include it in http-server-close if it even > makes sense. For the 504 I'm definitely willing to fix the default behaviour. For the other ones, we definitely cannot change it without an option. And I really guess this option will only result in misuses or will never be used. So let's do it for 504 only. Do you want to submit a patch for this with a detailed enough description so that if someone suffers from the behaviour change we have all the context ? Thanks, Willy
Re: HAProxy does not write 504 on keep-alive connections
> > > > I traced back that change to: > > > http://git.haproxy.org/?p=haproxy-1.6.git;a=commit;h=6b726adb35d998eb55671c0d98ef889cb9fd64ab > > > > I don't understand why it's saner to kill the connection and hide the 504 > > instead of clearly stating the error and let the application handle the > > timeout. > > As explained above, it's because a keep-alive enabled client must implement > the ability to replay requests for which it didn't get a response because > the connection died. In fact we're forwarding to the client what we saw on > the server side so that the client can take the correct decision. If your > client was directly connected to the server, it would have seen the exact > same behaviour. > That's where it gets confusing. HAProxy treats a server timeout on the first request and the nth request differently and I don't see why. There are no semantic differences between a timeout on the first or on the second request of the keep-alive connection. If I see a 504 when it's the first request, why don't I see a 504 when it's the second? >From your explanation, I understand that HAProxy closes the connection abruptly only to force the client to retry which is either unsafe or not possible in most cases. The only solution for the client is to deal with the consequences of a "remote disconnected" error, which is false and has much less information than a 504 timeout. What's even worse is that with `http-server-close`, so that HAProxy uses a brand new connection with the server, it'll still kills the client connection instead of "forwarding" what it sees as a timeout. If my client was directly connected to the server it would have hanged. I configure haproxy timeout to avoid that and trigger a timeout. It's not related to RFCs, I know they say almost nothing on keep-alive connection getting closed. I'm not advocating having this as the default behavior (even though I think the current default is wrong and it should be documented), but at least having this as an option, or include it in http-server-close if it even makes sense. Bests
Re: HAProxy does not write 504 on keep-alive connections
On Wed, Nov 11, 2015 at 06:55:11PM -0800, Bryan Talbot wrote: > On Wed, Nov 11, 2015 at 6:47 AM, Holger Just wrote: > > > > > As a loadbalancer however, HAProxy should always return a proper HTTP > > error if the request was even partially forwarded to the server. It's > > probably fine to just close the connection if the connect timeout stroke > > and the request was never actually handled anywhere but it should > > definitely return a real HTTP error if its the sever timeout and a > > backend server started doing anything with a request. > > > > > This would be my preferred behavior and actually what I thought haproxy was > already doing. Guys, please read the HTTP RFC, you *can't* do that by default. HTTP/1 doesn't warn before closing an idle keep-alive connection. So if you send a request over an existing connection and the server closes at the same time, you get exactly the situation above. And you clearly don't want to send a 502 or 504 to a client because it will be displayed on the browser. Remember the issues we had with Chrome's preconnect and 408 ? That would be the same. We had to silence the 408 on keep-alive connections to the client so that the browser could replay. Here it's the same, by silently closing, we're telling the browser it should retry, and we're ensuring not to interfere between the browser and the server regarding the connection's behaviour. And it's the browser's responsibility to only retry safe requests (those that are called "idempotent"). Normally it does this by selecting which requests can be sent over an existing connection. That ensures a non- idempotent request cannot hit a closed connection. In web services environments, in order to address this, you often see the requests sent in two parts, first a POST is emitted with an Expect: 100-continue, and only once the server responds, the body is emitted (which ensures that the connection is still alive). Note by the way that it requires two round trips to do this so there's little benefit to keeping persistent connections in such a case. You have the exact same situation between the browser and haproxy, or the browser and the server. When haproxy or the server closes a keep-alive connection, the browser doesn't know whether the request was being processed or not, and that's the reason why it (normally) doesn't send unsafe request over existing connections. I'm not opposed to having an option to make these errors verbose for web services environments, but be prepared to hear end-users complain if you enable this with browsers-facing web sites, because your users will unexpectedly get some 502 or 504. Yes the persistent connection model in HTTP/1 is far from being perfect, and that's one of the reasons it was changed in HTTP/2. Willy
Re: HAProxy does not write 504 on keep-alive connections
On Wed, Nov 11, 2015 at 6:47 AM, Holger Just wrote: > > As a loadbalancer however, HAProxy should always return a proper HTTP > error if the request was even partially forwarded to the server. It's > probably fine to just close the connection if the connect timeout stroke > and the request was never actually handled anywhere but it should > definitely return a real HTTP error if its the sever timeout and a > backend server started doing anything with a request. > > This would be my preferred behavior and actually what I thought haproxy was already doing. -Bryan
Re: HAProxy does not write 504 on keep-alive connections
Hi, Willy Tarreau wrote: > As explained above, it's because a keep-alive enabled client must implement > the ability to replay requests for which it didn't get a response because > the connection died. In fact we're forwarding to the client what we saw on > the server side so that the client can take the correct decision. If your > client was directly connected to the server, it would have seen the exact > same behaviour. One of the problems we saw when trying to reproduce this issue was that some clients I tried (i.e. curl 7.45.0 and ruby's httpclient gem) silently replayed any requests for which they didn't receive an answer. This can result duplicated POSTs on the backend servers. Often, servers continue to handle the first POST, even after HAProxy closed the backend connection because its server timeout stroke. Now, the client just replays the POST again resulting in potentially fatal behavior. If I understand the HTTP specs correctly, this replay is correct from the clients perspective as they can't know if they are speaking to a loadbalancer or an origin server directly. As a loadbalancer however, HAProxy should always return a proper HTTP error if the request was even partially forwarded to the server. It's probably fine to just close the connection if the connect timeout stroke and the request was never actually handled anywhere but it should definitely return a real HTTP error if its the sever timeout and a backend server started doing anything with a request. You could probably argue to differentiate between safe and unsafe methods and to also just close for safe ones but is probably even more confusing and has the potential for subtle bugs. Best, Holger
Re: HAProxy does not write 504 on keep-alive connections
Hi, On Wed, Nov 11, 2015 at 03:46:38AM +, Laurent Senta wrote: > Thanks for the reply guys, > after investigating the source code, it looks like this behavior is wanted. Yes, it is mandated by the way keep-alive works in HTTP. You never know when the connection will abort, and at the moment you reuse a connection, it may fail on you. People don't want to see in their browser errors that are not real errors and just a result of normal traffic, instead the browser knows it must retry because that situation is expected. But it will not retry if it receives a valid response (and an error is a valid response). > I've been able to "fix it" by removing these two lines: > > diff --git a/src/proto_http.c b/src/proto_http.c > index 2dcac06..d33b4a1 100644 > --- a/src/proto_http.c > +++ b/src/proto_http.c > @@ -6125,8 +6125,6 @@ int http_wait_for_response(struct stream *s, struct > channel *rep, int an_bit) > else if (rep->flags & CF_READ_TIMEOUT) { > if (msg->err_pos >= 0) > > http_capture_bad_message(&s->be->invalid_rep, s, msg, msg->msg_state, > sess->fe); > - else if (txn->flags & TX_NOT_FIRST) > - goto abort_keep_alive; I would argue that we could possibly relax this for timeouts indeed. Someone who configures haproxy's keep-alive timeout to a value lower than the surrounding firewalls' timeouts is seeking trouble. And the bad was already done by making the client wait. So probably we'd rather report the 504 here. Would it be enough for everyone if we just removed this one or do we need something more configurable like Tait's patch ? I think we could have something with verbosity levels : - act the most transparently possible regarding our local issues (for browsers), which means replicate on the client side what we see on the server side (eg: close when we get a close). - maybe report errors in logs but still silently close the connection - only report suspicious errors (eg: 504 here) to the client - report them all (eg: for webservices where this helps detect a failing server) Before rushing on a patch we should also consider this with the http-reuse that was introduced in 1.6, to ensure we don't end up with something ugly. > I traced back that change to: > http://git.haproxy.org/?p=haproxy-1.6.git;a=commit;h=6b726adb35d998eb55671c0d98ef889cb9fd64ab > > I don't understand why it's saner to kill the connection and hide the 504 > instead of clearly stating the error and let the application handle the > timeout. As explained above, it's because a keep-alive enabled client must implement the ability to replay requests for which it didn't get a response because the connection died. In fact we're forwarding to the client what we saw on the server side so that the client can take the correct decision. If your client was directly connected to the server, it would have seen the exact same behaviour. Regards, Willy
Re: HAProxy does not write 504 on keep-alive connections
Thanks for the reply guys, after investigating the source code, it looks like this behavior is wanted. I've been able to "fix it" by removing these two lines: diff --git a/src/proto_http.c b/src/proto_http.c index 2dcac06..d33b4a1 100644 --- a/src/proto_http.c +++ b/src/proto_http.c @@ -6125,8 +6125,6 @@ int http_wait_for_response(struct stream *s, struct channel *rep, int an_bit) else if (rep->flags & CF_READ_TIMEOUT) { if (msg->err_pos >= 0) http_capture_bad_message(&s->be->invalid_rep, s, msg, msg->msg_state, sess->fe); - else if (txn->flags & TX_NOT_FIRST) - goto abort_keep_alive; s->be->be_counters.failed_resp++; if (objt_server(s->target)) { I traced back that change to: http://git.haproxy.org/?p=haproxy-1.6.git;a=commit;h=6b726adb35d998eb55671c0d98ef889cb9fd64ab I don't understand why it's saner to kill the connection and hide the 504 instead of clearly stating the error and let the application handle the timeout. On Wed, Nov 11, 2015 at 3:11 AM Tait Clarridge wrote: > On Tue, Nov 10, 2015 at 7:32 PM, Bryan Talbot > wrote: > > On Tue, Nov 10, 2015 at 12:04 PM, Laurent Senta > > > wrote: > >> > >> Hi there, > >> I think there's a bug in how HAProxy handles timeout, that'd be great if > >> you can confirm or help me figure out what I do wrong: > >> > >> Basically: if a server timeout happens on a keep-alive connection, > haproxy > >> does not write a 504 response before closing the socket. > > > > > > I believe this is the same behaviour that I submitted a patch (without > a lot of context in the description) for in > https://www.marc.info/?l=haproxy&m=141822428718492&w=4 > > We have been running a patched version of 1.5.x with this applied in > our environment for almost a year, which although it has a very > specific use case and does not touch all the features of the project, > has had zero issues with leaking sockets/memory (or segfaults). > >
Re: HAProxy does not write 504 on keep-alive connections
On Tue, Nov 10, 2015 at 7:32 PM, Bryan Talbot wrote: > On Tue, Nov 10, 2015 at 12:04 PM, Laurent Senta > wrote: >> >> Hi there, >> I think there's a bug in how HAProxy handles timeout, that'd be great if >> you can confirm or help me figure out what I do wrong: >> >> Basically: if a server timeout happens on a keep-alive connection, haproxy >> does not write a 504 response before closing the socket. > > I believe this is the same behaviour that I submitted a patch (without a lot of context in the description) for in https://www.marc.info/?l=haproxy&m=141822428718492&w=4 We have been running a patched version of 1.5.x with this applied in our environment for almost a year, which although it has a very specific use case and does not touch all the features of the project, has had zero issues with leaking sockets/memory (or segfaults).
Re: HAProxy does not write 504 on keep-alive connections
On Tue, Nov 10, 2015 at 12:04 PM, Laurent Senta wrote: > Hi there, > I think there's a bug in how HAProxy handles timeout, that'd be great if > you can confirm or help me figure out what I do wrong: > > Basically: if a server timeout happens on a keep-alive connection, haproxy > does not write a 504 response before closing the socket. > I am able to reproduce this behavior too. I tried a few versions of haproxy all the way back to 1.4.0 and they all did this. The general way to reproduce is: 1) open tcp connection 2) make request that completes before 'timeout server' / 'timeout client' 3) make request that does NOT complete before 'timeout server' on same tcp connection The second request gets no response and the connection is just closed. If the same request (#3) is made on a new tcp connection that did not have a previously successful response, the response is a 504 and not the silently closed connection. The haproxy.cfg I'm using is global maxconn 4096 defaults mode http timeout connect 5s timeout client 450ms timeout server 450ms listen http bind :8000 server slowserv 127.0.0.1:8002 And 'slowserv' simply sleeps for the amount of time requested through a query string parameter. Note that curl and httpclient it seems is not good to test in this situation because both do retry the request that receives an empty response -- even if that request was a POST. I'm able to see the successful and empty-responses using wget like this by forcing wget to not retry (only try once). In these cases, the backend will take 0.4 seconds and 0.5 seconds to respond (respectively). The timeout server is configured to strike at 0.45 seconds. $> wget --tries 1 -O /dev/null "http://127.0.0.1:8000/?0.4"; " http://127.0.0.1:8000/?0.5"; > > This leads python to fail with a serious BadStatusLineError instead of a > simple http error. > And Ruby to retry potentially non-idempotent methods. > > Here's a basic setup to reproduce the error: > https://gist.github.com/lsenta/1d33c6a01c07b32ac18a > > I've also had some help by meineerde on irc, here's the haproxy logs with > a ruby client doing the same request: > https://gist.github.com/meineerde/87a571c57369d322dae0#gistcomment-1617687 > > In case there is some confusion, the comment about "6 gets instead of 5" is due to the ruby httpclient or curl retrying on that end and is not being re-tried by haproxy. > I've seen this behavior with v.1.6.2, 1.5.15 and 1.6.0 > and several 1.5.x, 1.4.26, 1.4.6, and 1.4.0 all the same. -Bryan
HAProxy does not write 504 on keep-alive connections
Hi there, I think there's a bug in how HAProxy handles timeout, that'd be great if you can confirm or help me figure out what I do wrong: Basically: if a server timeout happens on a keep-alive connection, haproxy does not write a 504 response before closing the socket. This leads python to fail with a serious BadStatusLineError instead of a simple http error. And Ruby to retry potentially non-idempotent methods. Here's a basic setup to reproduce the error: https://gist.github.com/lsenta/1d33c6a01c07b32ac18a I've also had some help by meineerde on irc, here's the haproxy logs with a ruby client doing the same request: https://gist.github.com/meineerde/87a571c57369d322dae0#gistcomment-1617687 I've seen this behavior with v.1.6.2, 1.5.15 and 1.6.0 Best,