Re: haproxy intermittently not connecting to backend
Hi Patrick, On Sat, Apr 12, 2014 at 01:38:54AM -0400, Patrick Hemmer wrote: This just keeps coming back to bug me. I don't think the client closing the connection should result in a 5XX code. 5XX should indicate a server issue, and the client closing the connection before the server has a chance to respond isn't a server issue. Only if the server doesn't respond within the configured timeout should it be a 5XX. Nginx uses 499 for client closed connection. Perhaps haproxy could use that status code as well when `option abortonclose` is used. It's wrong to invent new status codes, because they'll sooner or later conflict with one officially assigned (or worse, they'll become so much used that they'll make it harder to improve the standards). I get your point though. I'm used to say that 5xx is an error that the client should not be able to induce. This is not totally right nowadays due to 501, nor due to web services which like to return 500 when they want to return false... But in general that's the idea. However here it's not as black and white. If a client manages to close before the connection to the server is opened, it's generally because the server takes time to accept the connection. The longer it takes, the higher the number of 503 due to client aborts. What we should try to avoid in my opinion is to return 503 immediately. I think that the semantics of 408 are the closest to what we're expecting in terms of asking the client to retry if needed, eventhough that's a different technical issue. I'd rather not use plain 400 to avoid polluting the real 400 that admins have a hard time trying to fix sometimes. Any optinion on this ? Willy
Re: haproxy intermittently not connecting to backend
*From: *Willy Tarreau w...@1wt.eu *Sent: * 2014-04-14 11:27:59 E *To: *Patrick Hemmer hapr...@stormcloud9.net *CC: *haproxy@formilux.org *Subject: *Re: haproxy intermittently not connecting to backend Hi Patrick, On Sat, Apr 12, 2014 at 01:38:54AM -0400, Patrick Hemmer wrote: This just keeps coming back to bug me. I don't think the client closing the connection should result in a 5XX code. 5XX should indicate a server issue, and the client closing the connection before the server has a chance to respond isn't a server issue. Only if the server doesn't respond within the configured timeout should it be a 5XX. Nginx uses 499 for client closed connection. Perhaps haproxy could use that status code as well when `option abortonclose` is used. It's wrong to invent new status codes, because they'll sooner or later conflict with one officially assigned (or worse, they'll become so much used that they'll make it harder to improve the standards). RFC2616 says HTTP status codes are extensible and even gives a specific scenario how the client should handle an unregistered code (look for the if an unrecognized status code of 431 is received by the client example). I get your point though. I'm used to say that 5xx is an error that the client should not be able to induce. This is not totally right nowadays due to 501, nor due to web services which like to return 500 when they want to return false... But in general that's the idea. However here it's not as black and white. If a client manages to close before the connection to the server is opened, it's generally because the server takes time to accept the connection. The longer it takes, the higher the number of 503 due to client aborts. What we should try to avoid in my opinion is to return 503 immediately. I think that the semantics of 408 are the closest to what we're expecting in terms of asking the client to retry if needed, eventhough that's a different technical issue. I'd rather not use plain 400 to avoid polluting the real 400 that admins have a hard time trying to fix sometimes. I disagree with the statement that we should avoid immediate response when the connection is closed. Going back to RFC1945 (HTTP 1.0), we have this: In any case, the closing of the connection by either or both parties always terminates the current request, regardless of its status. But that is HTTP 1.0, so it's validity in this case is tenuous. I couldn't find a similar statement in RFC2616, or anything which states how it should be handled when the client closes it's connection prior to response. I guess this is why it's a configurable option :-) If we want to use a registered status code, I would argue in favor of 417 which has the following in it's description: if the server is a proxy, the server has unambiguous evidence that the request could not be met by the next-hop server Would it be difficult to add a parameter to the option? Such as option httpclose 417 to control how haproxy responds? Any optinion on this ? Willy
Re: haproxy intermittently not connecting to backend
On Mon, Apr 14, 2014 at 12:08:40PM -0400, Patrick Hemmer wrote: Nginx uses 499 for client closed connection. Perhaps haproxy could use that status code as well when `option abortonclose` is used. It's wrong to invent new status codes, because they'll sooner or later conflict with one officially assigned (or worse, they'll become so much used that they'll make it harder to improve the standards). RFC2616 says HTTP status codes are extensible and even gives a specific scenario how the client should handle an unregistered code (look for the if an unrecognized status code of 431 is received by the client example). Yes, but this was not meant for anyone to pick unregistered codes, but for the protocol to be extensible. More info here : http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-26#section-8.2 8.2. Status Code Registry The HTTP Status Code Registry defines the name space for the response status-code token (Section 6). The status code registry is maintained at http://www.iana.org/assignments/http-status-codes. And 8.2.2 is even clearer about this : Proposals for new status codes that are not yet widely deployed ought to avoid allocating a specific number for the code until there is clear consensus that it will be registered; instead, early drafts can use a notation such as 4NN, or 3N0 .. 3N9, to indicate the class of the proposed status code(s) without consuming a number prematurely. There was never any consensus for having a range of free status codes just like is done with headers starting with X- unfortunately. However here it's not as black and white. If a client manages to close before the connection to the server is opened, it's generally because the server takes time to accept the connection. The longer it takes, the higher the number of 503 due to client aborts. What we should try to avoid in my opinion is to return 503 immediately. I think that the semantics of 408 are the closest to what we're expecting in terms of asking the client to retry if needed, eventhough that's a different technical issue. I'd rather not use plain 400 to avoid polluting the real 400 that admins have a hard time trying to fix sometimes. I disagree with the statement that we should avoid immediate response when the connection is closed. Going back to RFC1945 (HTTP 1.0), we have this: In any case, the closing of the connection by either or both parties always terminates the current request, regardless of its status. But that is HTTP 1.0, so it's validity in this case is tenuous. I couldn't find a similar statement in RFC2616, or anything which states how it should be handled when the client closes it's connection prior to response. I guess this is why it's a configurable option :-) The problem is that people who documented HTTP in the past didn't necessarily know by then that TCP was two half-connections and not a single full one. Unless you have a hard error on a socket, you never know if the other side is closed and TCP offers no provisions for this. TCP only says that's all I have to say (ie: FIN). In practice, many scripts or even health checks from various systems do send a request with a FIN immediately because they have nothing more to say. It will become even more common with TCP FastOpen where it's very convenient to send SYN+request+FIN in a single packet and receive the response along with the SYN-ACK and a FIN as well. Almost UDP! If we want to use a registered status code, I would argue in favor of 417 which has the following in it's description: if the server is a proxy, the server has unambiguous evidence that the request could not be met by the next-hop server I thought about it, but 417 has a very specific meaning related to the Expect header and will definitely have side effects when sent in response to a request having an Expect header (such as the client not being able to reformulate the request ever). Would it be difficult to add a parameter to the option? Such as option httpclose 417 to control how haproxy responds? It's an idea that is worth thinking about. It would be for abortonclose though, but the idea is there. One thing I don't like with overloading options is that in the past they were plain booleans and were cancellable with no in front of them. As time goes, some options take extra args which make them harder to cancel, or more confusing. Maybe we could do something even better, consisting in enumerating the long list of all events which lead to a fixed status code and make it possible to remap them. I'm pretty sure that for some users it might be really nice, even if it allows some to write stupid configurations. Regards, Willy
Re: haproxy intermittently not connecting to backend
This just keeps coming back to bug me. I don't think the client closing the connection should result in a 5XX code. 5XX should indicate a server issue, and the client closing the connection before the server has a chance to respond isn't a server issue. Only if the server doesn't respond within the configured timeout should it be a 5XX. Nginx uses 499 for client closed connection. Perhaps haproxy could use that status code as well when `option abortonclose` is used. -Patrick *From: *Patrick Hemmer hapr...@stormcloud9.net *Sent: * 2014-04-02 15:50:22 E *To: *haproxy@formilux.org *Subject: *Re: haproxy intermittently not connecting to backend That makes perfect sense. Thank you very much. -Patrick *From: *Willy Tarreau w...@1wt.eu *Sent: * 2014-04-02 15:38:04 E *To: *Patrick Hemmer hapr...@stormcloud9.net *CC: *haproxy@formilux.org *Subject: *Re: haproxy intermittently not connecting to backend Hi Patrick, On Tue, Apr 01, 2014 at 03:20:15PM -0400, Patrick Hemmer wrote: We have an issue with haproxy (1.5-dev22-1a34d57) where it is intermittently not connecting to the backend server. However the behavior it is exhibiting seems strange. The reason I say strange is that in one example, it logged that the client disconnected after ~49 seconds with a connection flags of CC--. However our config has timeout connect 5000, so it should have timed out connecting to the backend server after 5 seconds. Additionally we have retries 3 in the config, so upon timing out, it should have tried another backend server, but it never did (the retries counter in the log shows 0). No, retries impacts only retries to the same server, it's option redispatch which allows the last retry to be performed on another server. But you have it anyway. At the time of this log entry, the backend server is responding properly. For the ~49 seconds prior to the log entry, the backend server has taken other requests. The backend server is also another haproxy (same version). Here's an example of one such log entry: [fixed version pasted here] 198.228.211.13:60848 api~ platform-push/i-84d931a5 49562/0/-1/-1/49563 0/0/0/0/0 0/0 691/212 503 CC-- 4F8E-4624 + GET /1/sync/notifications/subscribe?sync_box_id=12496sender=D7A9F93D-F653-4527-A022-383AD55A1943 HTTP/1.1 OK in fact the client did not wait 49 seconds. If you look closer, you'll see that the client remained silent for 49 seconds (typically a connection pool or a preconnect) and closed immediately after sending the request (in the same millisecond). Since you have option abortonclose, the connection was aborted before the server had a chance to respond. So I can easily imagine that you randomly get this error, you're in a race condition, if the server responds immediately, you win the race and the request is handled, otherwise it's aborted. Please start by removing option abortonclose, I think it will fix the issue. Second thing you can do is to remove option httpclose or replace it with option http-server-close which is active and not just passive. The connections will last less time on your servers which is always appreciated. I'm not seeing any other issue, so with just this you should be fine. The log format is defined as: %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r Running a show errors on the stats socket did not return any relevant results. Here's the relevant portions of the haproxy config. It is not the entire thing as the whole config is 1,513 lines long. global log 127.0.0.1 local0 maxconn 20480 user haproxy group haproxy daemon stats socket /var/run/hapi/haproxy/haproxy.sock level admin defaults log global mode http option httplog option dontlognull option log-separate-errors retries 3 option redispatch timeout connect 5000 timeout client 6 timeout server 17 option clitcpka option srvtcpka option abortonclose option splice-auto monitor-uri /haproxy/ping stats enable stats uri /haproxy/stats stats refresh 15 stats auth user:pass frontend api bind *:80 bind *:443 ssl crt /etc/haproxy/server.pem maxconn 2 option httpclose option forwardfor acl internal src 10.0.0.0/8 acl have_request_id req.fhdr(X-Request-Id) -m found http-request set-nice -100 if internal http-request add-header X-API-URL %[path] if !internal http-request add-header X-Request-Timestamp %Ts.%ms http-request add-header X-Request-Id %[req.fhdr(X-Request-Id)] if internal have_request_id http-request set-header X-Request-Id %{+X}o%pid-%rt if !internal || !have_request_id http-request add-header X-API-Host i-4a3b1c6a unique-id-format %{+X}o%pid-%rt log-format %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc
Re: haproxy intermittently not connecting to backend
Hi Patrick, Le 01/04/2014 21:52, Patrick Hemmer a écrit : Apologies, my mail client went stupid. Here's the log entry unmangled: 198.228.211.13:60848 api~ platform-push/i-84d931a5 49562/0/-1/-1/49563 0/0/0/0/0 0/0 691/212 503 CC-- 4F8E-4624 + GET /1/sync/notifications/subscribe?sync_box_id=12496sender=D7A9F93D-F653-4527-A022-383AD55A1943 HTTP/1.1 This line looks really weird : actconn, feconn, beconn and srv_conn are all equal to 0. Is it something you can reproduce easily ? and can you test with the current snapshot ? Is there a huge traffic on the server where option httpclose could involve port exhaustion ? Also can you provide your kernel version (to check if splicing is not buggy) ? -Patrick *From: *Patrick Hemmer hapr...@stormcloud9.net *Sent: * 2014-04-01 15:20:15 E *To: *haproxy@formilux.org *Subject: *haproxy intermittently not connecting to backend We have an issue with haproxy (1.5-dev22-1a34d57) where it is intermittently not connecting to the backend server. However the behavior it is exhibiting seems strange. The reason I say strange is that in one example, it logged that the client disconnected after ~49 seconds with a connection flags of CC--. However our config has timeout connect 5000, so it should have timed out connecting to the backend server after 5 seconds. Additionally we have retries 3 in the config, so upon timing out, it should have tried another backend server, but it never did (the retries counter in the log shows 0). At the time of this log entry, the backend server is responding properly. For the ~49 seconds prior to the log entry, the backend server has taken other requests. The backend server is also another haproxy (same version). Here's an example of one such log entry: 198.228.211.13:60848 api~ platform-push/i-84d931a5 49562/0/-1/-1/49563 0/0/0/0/0 0/0 691/212 lt; span class=t style=border-color: rgb(204, 204, 204); font-style: normal; cursor: pointer;503CC-- 4F8E-4624 +GET /1/sync/notifications/subscribe?sync_box_ id=12345sender=27B9A93C-F473-4385-A662-352AD34A2453 HTTP/1.1 The log format is defined as: %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r Running a show errors on the stats socket did not return any relevant results. Here's the relevant portions of the haproxy config. It is not the entire thing as the whole config is 1,513 lines long. global log 127.0.0.1 local0 maxconn 20480 user haproxy group haproxy daemon stats socket /var/run/hapi/haproxy/haproxy.sock level admin defaults log global mode http option httplog option dontlognull option log-separate-errors retries 3 option redispatch timeout connect 5000 timeout client 6 timeout server 17 option clitcpka option srvtcpka option abortonclose option splice-auto monitor-uri /haproxy/ping stats enable stats uri /haproxy/stats stats refresh 15 stats auth user:pass frontend api bind *:80 bind *:443 ssl crt /etc/haproxy/server.pem maxconn 2 option httpclose option forwardfor acl internal src 10.0.0.0/8 acl have_request_id req.fhdr(X-Request-Id) -m found http-request set-nice -100 if internal http-request add-header X-API-URL %[path] if !internal http-request add-header X-Request-Timestamp %Ts.%ms http-request add-header X-Request-Id %[req.fhdr(X-Request-Id)] if internal have_request_id http-request set-header X-Request-Id %{+X}o%pid-%rt if !internal || !have_request_id http-request add-header X-API-Host i-4a3b1c6a unique-id-format %{+X}o%pid-%rt log-format %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r default_backend DEFAULT_404 acl rewrite-found req.hdr(X-Rewrite-ID,1) -m found acl nqXn_path path_reg ^/1/sync/notifications/subscribe/([^\ ?]*)$ acl nqXn_method method OPTIONS GET HEAD POST PUT DELETE TRACE CONNECT PATCH http-request set-header X-Rewrite-Id nqXn if !rewrite-found nqXn_path nqXn_method acl rewrite-nqXn req.hdr(X-Rewrite-Id) -m str nqXn use_backend platform-push if rewrite-nqXn reqrep ^(OPTIONS|GET|HEAD|POST|PUT|DELETE|TRACE|CONNECT|PATCH)\ /1/sync/notifications/subscribe/([^\ ?]*)([\ ?].*|$) \1\ /1/sync/subscribe/\2\3 if rewrite-nqXn backend platform-push option httpchk GET /ping default-server inter 15s fastinter 1s server i-6eaf724d 10.230.23.64:80 check observe layer4 server i-84d931a5 10.230.42.8:80 check observe layer4 -- Cyril Bonté
Re: haproxy intermittently not connecting to backend
Hi Patrick, On Tue, Apr 01, 2014 at 03:20:15PM -0400, Patrick Hemmer wrote: We have an issue with haproxy (1.5-dev22-1a34d57) where it is intermittently not connecting to the backend server. However the behavior it is exhibiting seems strange. The reason I say strange is that in one example, it logged that the client disconnected after ~49 seconds with a connection flags of CC--. However our config has timeout connect 5000, so it should have timed out connecting to the backend server after 5 seconds. Additionally we have retries 3 in the config, so upon timing out, it should have tried another backend server, but it never did (the retries counter in the log shows 0). No, retries impacts only retries to the same server, it's option redispatch which allows the last retry to be performed on another server. But you have it anyway. At the time of this log entry, the backend server is responding properly. For the ~49 seconds prior to the log entry, the backend server has taken other requests. The backend server is also another haproxy (same version). Here's an example of one such log entry: [fixed version pasted here] 198.228.211.13:60848 api~ platform-push/i-84d931a5 49562/0/-1/-1/49563 0/0/0/0/0 0/0 691/212 503 CC-- 4F8E-4624 + GET /1/sync/notifications/subscribe?sync_box_id=12496sender=D7A9F93D-F653-4527-A022-383AD55A1943 HTTP/1.1 OK in fact the client did not wait 49 seconds. If you look closer, you'll see that the client remained silent for 49 seconds (typically a connection pool or a preconnect) and closed immediately after sending the request (in the same millisecond). Since you have option abortonclose, the connection was aborted before the server had a chance to respond. So I can easily imagine that you randomly get this error, you're in a race condition, if the server responds immediately, you win the race and the request is handled, otherwise it's aborted. Please start by removing option abortonclose, I think it will fix the issue. Second thing you can do is to remove option httpclose or replace it with option http-server-close which is active and not just passive. The connections will last less time on your servers which is always appreciated. I'm not seeing any other issue, so with just this you should be fine. The log format is defined as: %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r Running a show errors on the stats socket did not return any relevant results. Here's the relevant portions of the haproxy config. It is not the entire thing as the whole config is 1,513 lines long. global log 127.0.0.1 local0 maxconn 20480 user haproxy group haproxy daemon stats socket /var/run/hapi/haproxy/haproxy.sock level admin defaults log global mode http option httplog option dontlognull option log-separate-errors retries 3 option redispatch timeout connect 5000 timeout client 6 timeout server 17 option clitcpka option srvtcpka option abortonclose option splice-auto monitor-uri /haproxy/ping stats enable stats uri /haproxy/stats stats refresh 15 stats auth user:pass frontend api bind *:80 bind *:443 ssl crt /etc/haproxy/server.pem maxconn 2 option httpclose option forwardfor acl internal src 10.0.0.0/8 acl have_request_id req.fhdr(X-Request-Id) -m found http-request set-nice -100 if internal http-request add-header X-API-URL %[path] if !internal http-request add-header X-Request-Timestamp %Ts.%ms http-request add-header X-Request-Id %[req.fhdr(X-Request-Id)] if internal have_request_id http-request set-header X-Request-Id %{+X}o%pid-%rt if !internal || !have_request_id http-request add-header X-API-Host i-4a3b1c6a unique-id-format %{+X}o%pid-%rt log-format %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r default_backend DEFAULT_404 acl rewrite-found req.hdr(X-Rewrite-ID,1) -m found acl nqXn_path path_reg ^/1/sync/notifications/subscribe/([^\ ?]*)$ acl nqXn_method method OPTIONS GET HEAD POST PUT DELETE TRACE CONNECT PATCH http-request set-header X-Rewrite-Id nqXn if !rewrite-found nqXn_path nqXn_method acl rewrite-nqXn req.hdr(X-Rewrite-Id) -m str nqXn use_backend platform-push if rewrite-nqXn reqrep ^(OPTIONS|GET|HEAD|POST|PUT|DELETE|TRACE|CONNECT|PATCH)\ /1/sync/notifications/subscribe/([^\ ?]*)([\ ?].*|$) \1\ /1/sync/subscribe/\2\3 if rewrite-nqXn backend platform-push option httpchk GET /ping default-server inter 15s fastinter 1s server i-6eaf724d 10.230.23.64:80 check observe layer4 server i-84d931a5 10.230.42.8:80 check observe layer4 Regards, Willy
Re: haproxy intermittently not connecting to backend
That makes perfect sense. Thank you very much. -Patrick *From: *Willy Tarreau w...@1wt.eu *Sent: * 2014-04-02 15:38:04 E *To: *Patrick Hemmer hapr...@stormcloud9.net *CC: *haproxy@formilux.org *Subject: *Re: haproxy intermittently not connecting to backend Hi Patrick, On Tue, Apr 01, 2014 at 03:20:15PM -0400, Patrick Hemmer wrote: We have an issue with haproxy (1.5-dev22-1a34d57) where it is intermittently not connecting to the backend server. However the behavior it is exhibiting seems strange. The reason I say strange is that in one example, it logged that the client disconnected after ~49 seconds with a connection flags of CC--. However our config has timeout connect 5000, so it should have timed out connecting to the backend server after 5 seconds. Additionally we have retries 3 in the config, so upon timing out, it should have tried another backend server, but it never did (the retries counter in the log shows 0). No, retries impacts only retries to the same server, it's option redispatch which allows the last retry to be performed on another server. But you have it anyway. At the time of this log entry, the backend server is responding properly. For the ~49 seconds prior to the log entry, the backend server has taken other requests. The backend server is also another haproxy (same version). Here's an example of one such log entry: [fixed version pasted here] 198.228.211.13:60848 api~ platform-push/i-84d931a5 49562/0/-1/-1/49563 0/0/0/0/0 0/0 691/212 503 CC-- 4F8E-4624 + GET /1/sync/notifications/subscribe?sync_box_id=12496sender=D7A9F93D-F653-4527-A022-383AD55A1943 HTTP/1.1 OK in fact the client did not wait 49 seconds. If you look closer, you'll see that the client remained silent for 49 seconds (typically a connection pool or a preconnect) and closed immediately after sending the request (in the same millisecond). Since you have option abortonclose, the connection was aborted before the server had a chance to respond. So I can easily imagine that you randomly get this error, you're in a race condition, if the server responds immediately, you win the race and the request is handled, otherwise it's aborted. Please start by removing option abortonclose, I think it will fix the issue. Second thing you can do is to remove option httpclose or replace it with option http-server-close which is active and not just passive. The connections will last less time on your servers which is always appreciated. I'm not seeing any other issue, so with just this you should be fine. The log format is defined as: %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r Running a show errors on the stats socket did not return any relevant results. Here's the relevant portions of the haproxy config. It is not the entire thing as the whole config is 1,513 lines long. global log 127.0.0.1 local0 maxconn 20480 user haproxy group haproxy daemon stats socket /var/run/hapi/haproxy/haproxy.sock level admin defaults log global mode http option httplog option dontlognull option log-separate-errors retries 3 option redispatch timeout connect 5000 timeout client 6 timeout server 17 option clitcpka option srvtcpka option abortonclose option splice-auto monitor-uri /haproxy/ping stats enable stats uri /haproxy/stats stats refresh 15 stats auth user:pass frontend api bind *:80 bind *:443 ssl crt /etc/haproxy/server.pem maxconn 2 option httpclose option forwardfor acl internal src 10.0.0.0/8 acl have_request_id req.fhdr(X-Request-Id) -m found http-request set-nice -100 if internal http-request add-header X-API-URL %[path] if !internal http-request add-header X-Request-Timestamp %Ts.%ms http-request add-header X-Request-Id %[req.fhdr(X-Request-Id)] if internal have_request_id http-request set-header X-Request-Id %{+X}o%pid-%rt if !internal || !have_request_id http-request add-header X-API-Host i-4a3b1c6a unique-id-format %{+X}o%pid-%rt log-format %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r default_backend DEFAULT_404 acl rewrite-found req.hdr(X-Rewrite-ID,1) -m found acl nqXn_path path_reg ^/1/sync/notifications/subscribe/([^\ ?]*)$ acl nqXn_method method OPTIONS GET HEAD POST PUT DELETE TRACE CONNECT PATCH http-request set-header X-Rewrite-Id nqXn if !rewrite-found nqXn_path nqXn_method acl rewrite-nqXn req.hdr(X-Rewrite-Id) -m str nqXn use_backend platform-push if rewrite-nqXn reqrep ^(OPTIONS|GET|HEAD|POST|PUT|DELETE|TRACE|CONNECT|PATCH)\ /1/sync/notifications/subscribe/([^\ ?]*)([\ ?].*|$) \1\ /1/sync/subscribe/\2\3 if rewrite-nqXn backend platform-push option httpchk GET /ping default-server inter 15s fastinter 1s
Re: haproxy intermittently not connecting to backend
Apologies, my mail client went stupid. Here's the log entry unmangled: 198.228.211.13:60848 api~ platform-push/i-84d931a5 49562/0/-1/-1/49563 0/0/0/0/0 0/0 691/212 503 CC-- 4F8E-4624 + GET /1/sync/notifications/subscribe?sync_box_id=12496sender=D7A9F93D-F653-4527-A022-383AD55A1943 HTTP/1.1 -Patrick *From: *Patrick Hemmer hapr...@stormcloud9.net *Sent: * 2014-04-01 15:20:15 E *To: *haproxy@formilux.org *Subject: *haproxy intermittently not connecting to backend We have an issue with haproxy (1.5-dev22-1a34d57) where it is intermittently not connecting to the backend server. However the behavior it is exhibiting seems strange. The reason I say strange is that in one example, it logged that the client disconnected after ~49 seconds with a connection flags of CC--. However our config has timeout connect 5000, so it should have timed out connecting to the backend server after 5 seconds. Additionally we have retries 3 in the config, so upon timing out, it should have tried another backend server, but it never did (the retries counter in the log shows 0). At the time of this log entry, the backend server is responding properly. For the ~49 seconds prior to the log entry, the backend server has taken other requests. The backend server is also another haproxy (same version). Here's an example of one such log entry: 198.228.211.13:60848 api~ platform-push/i-84d931a5 49562/0/-1/-1/49563 0/0/0/0/0 0/0 691/212 lt; span class=t style=border-color: rgb(204, 204, 204); font-style: normal; cursor: pointer;503 CC-- 4F8E-4624 + GET /1/sync/notifications/subscribe?sync_box_ id=12345sender=27B9A93C-F473-4385-A662-352AD34A2453 HTTP/1.1 The log format is defined as: %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r Running a show errors on the stats socket did not return any relevant results. Here's the relevant portions of the haproxy config. It is not the entire thing as the whole config is 1,513 lines long. global log 127.0.0.1 local0 maxconn 20480 user haproxy group haproxy daemon stats socket /var/run/hapi/haproxy/haproxy.sock level admin defaults log global mode http option httplog option dontlognull option log-separate-errors retries 3 option redispatch timeout connect 5000 timeout client 6 timeout server 17 option clitcpka option srvtcpka option abortonclose option splice-auto monitor-uri /haproxy/ping stats enable stats uri /haproxy/stats stats refresh 15 stats auth user:pass frontend api bind *:80 bind *:443 ssl crt /etc/haproxy/server.pem maxconn 2 option httpclose option forwardfor acl internal src 10.0.0.0/8 acl have_request_id req.fhdr(X-Request-Id) -m found http-request set-nice -100 if internal http-request add-header X-API-URL %[path] if !internal http-request add-header X-Request-Timestamp %Ts.%ms http-request add-header X-Request-Id %[req.fhdr(X-Request-Id)] if internal have_request_id http-request set-header X-Request-Id %{+X}o%pid-%rt if !internal || !have_request_id http-request add-header X-API-Host i-4a3b1c6a unique-id-format %{+X}o%pid-%rt log-format %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r default_backend DEFAULT_404 acl rewrite-found req.hdr(X-Rewrite-ID,1) -m found acl nqXn_path path_reg ^/1/sync/notifications/subscribe/([^\ ?]*)$ acl nqXn_method method OPTIONS GET HEAD POST PUT DELETE TRACE CONNECT PATCH http-request set-header X-Rewrite-Id nqXn if !rewrite-found nqXn_path nqXn_method acl rewrite-nqXn req.hdr(X-Rewrite-Id) -m str nqXn use_backend platform-push if rewrite-nqXn reqrep ^(OPTIONS|GET|HEAD|POST|PUT|DELETE|TRACE|CONNECT|PATCH)\ /1/sync/notifications/subscribe/([^\ ?]*)([\ ?].*|$) \1\ /1/sync/subscribe/\2\3 if rewrite-nqXn backend platform-push option httpchk GET /ping default-server inter 15s fastinter 1s server i-6eaf724d 10.230.23.64:80 check observe layer4 server i-84d931a5 10.230.42.8:80 check observe layer4