Re: haproxy intermittently not connecting to backend

2014-04-14 Thread Willy Tarreau
Hi Patrick,

On Sat, Apr 12, 2014 at 01:38:54AM -0400, Patrick Hemmer wrote:
 This just keeps coming back to bug me. I don't think the client closing
 the connection should result in a 5XX code. 5XX should indicate a server
 issue, and the client closing the connection before the server has a
 chance to respond isn't a server issue. Only if the server doesn't
 respond within the configured timeout should it be a 5XX.
 
 Nginx uses 499 for client closed connection. Perhaps haproxy could use
 that status code as well when `option abortonclose` is used.

It's wrong to invent new status codes, because they'll sooner or later
conflict with one officially assigned (or worse, they'll become so much
used that they'll make it harder to improve the standards).

I get your point though. I'm used to say that 5xx is an error that the
client should not be able to induce. This is not totally right nowadays
due to 501, nor due to web services which like to return 500 when they
want to return false... But in general that's the idea.

However here it's not as black and white. If a client manages to close
before the connection to the server is opened, it's generally because
the server takes time to accept the connection. The longer it takes,
the higher the number of 503 due to client aborts. What we should try
to avoid in my opinion is to return 503 immediately. I think that the
semantics of 408 are the closest to what we're expecting in terms of
asking the client to retry if needed, eventhough that's a different
technical issue. I'd rather not use plain 400 to avoid polluting the
real 400 that admins have a hard time trying to fix sometimes.

Any optinion on this ?

Willy




Re: haproxy intermittently not connecting to backend

2014-04-14 Thread Patrick Hemmer
*From: *Willy Tarreau w...@1wt.eu
*Sent: * 2014-04-14 11:27:59 E
*To: *Patrick Hemmer hapr...@stormcloud9.net
*CC: *haproxy@formilux.org
*Subject: *Re: haproxy intermittently not connecting to backend

 Hi Patrick,

 On Sat, Apr 12, 2014 at 01:38:54AM -0400, Patrick Hemmer wrote:
 This just keeps coming back to bug me. I don't think the client closing
 the connection should result in a 5XX code. 5XX should indicate a server
 issue, and the client closing the connection before the server has a
 chance to respond isn't a server issue. Only if the server doesn't
 respond within the configured timeout should it be a 5XX.

 Nginx uses 499 for client closed connection. Perhaps haproxy could use
 that status code as well when `option abortonclose` is used.
 It's wrong to invent new status codes, because they'll sooner or later
 conflict with one officially assigned (or worse, they'll become so much
 used that they'll make it harder to improve the standards).
RFC2616 says HTTP status codes are extensible and even gives a
specific scenario how the client should handle an unregistered code
(look for the if an unrecognized status code of 431 is received by the
client example).

 I get your point though. I'm used to say that 5xx is an error that the
 client should not be able to induce. This is not totally right nowadays
 due to 501, nor due to web services which like to return 500 when they
 want to return false... But in general that's the idea.

 However here it's not as black and white. If a client manages to close
 before the connection to the server is opened, it's generally because
 the server takes time to accept the connection. The longer it takes,
 the higher the number of 503 due to client aborts. What we should try
 to avoid in my opinion is to return 503 immediately. I think that the
 semantics of 408 are the closest to what we're expecting in terms of
 asking the client to retry if needed, eventhough that's a different
 technical issue. I'd rather not use plain 400 to avoid polluting the
 real 400 that admins have a hard time trying to fix sometimes.
I disagree with the statement that we should avoid immediate response
when the connection is closed. Going back to RFC1945 (HTTP 1.0), we have
this:
In any case, the closing of the connection by either or both parties
always terminates the current request, regardless of its status.
But that is HTTP 1.0, so it's validity in this case is tenuous. I
couldn't find a similar statement in RFC2616, or anything which states
how it should be handled when the client closes it's connection prior to
response. I guess this is why it's a configurable option :-)

If we want to use a registered status code, I would argue in favor of
417 which has the following in it's description:
if the server is a proxy, the server has unambiguous evidence that the
request could not be met by the next-hop server

Would it be difficult to add a parameter to the option? Such as option
httpclose 417 to control how haproxy responds?

 Any optinion on this ?

 Willy





Re: haproxy intermittently not connecting to backend

2014-04-14 Thread Willy Tarreau
On Mon, Apr 14, 2014 at 12:08:40PM -0400, Patrick Hemmer wrote:
  Nginx uses 499 for client closed connection. Perhaps haproxy could use
  that status code as well when `option abortonclose` is used.
  It's wrong to invent new status codes, because they'll sooner or later
  conflict with one officially assigned (or worse, they'll become so much
  used that they'll make it harder to improve the standards).
 RFC2616 says HTTP status codes are extensible and even gives a
 specific scenario how the client should handle an unregistered code
 (look for the if an unrecognized status code of 431 is received by the
 client example).

Yes, but this was not meant for anyone to pick unregistered codes, but for
the protocol to be extensible. More info here :

  http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-26#section-8.2

  8.2. Status Code Registry

   The HTTP Status Code Registry defines the name space for the response
   status-code token (Section 6).  The status code registry is
   maintained at http://www.iana.org/assignments/http-status-codes.

  And 8.2.2 is even clearer about this :

   Proposals for new status codes that are not yet widely deployed ought
   to avoid allocating a specific number for the code until there is
   clear consensus that it will be registered; instead, early drafts can
   use a notation such as 4NN, or 3N0 .. 3N9, to indicate the
   class of the proposed status code(s) without consuming a number
   prematurely.

There was never any consensus for having a range of free status codes
just like is done with headers starting with X- unfortunately.

  However here it's not as black and white. If a client manages to close
  before the connection to the server is opened, it's generally because
  the server takes time to accept the connection. The longer it takes,
  the higher the number of 503 due to client aborts. What we should try
  to avoid in my opinion is to return 503 immediately. I think that the
  semantics of 408 are the closest to what we're expecting in terms of
  asking the client to retry if needed, eventhough that's a different
  technical issue. I'd rather not use plain 400 to avoid polluting the
  real 400 that admins have a hard time trying to fix sometimes.
 I disagree with the statement that we should avoid immediate response
 when the connection is closed. Going back to RFC1945 (HTTP 1.0), we have
 this:
 In any case, the closing of the connection by either or both parties
 always terminates the current request, regardless of its status.
 But that is HTTP 1.0, so it's validity in this case is tenuous. I
 couldn't find a similar statement in RFC2616, or anything which states
 how it should be handled when the client closes it's connection prior to
 response. I guess this is why it's a configurable option :-)

The problem is that people who documented HTTP in the past didn't necessarily
know by then that TCP was two half-connections and not a single full one.
Unless you have a hard error on a socket, you never know if the other side
is closed and TCP offers no provisions for this. TCP only says that's all
I have to say (ie: FIN).

In practice, many scripts or even health checks from various systems do
send a request with a FIN immediately because they have nothing more to
say. It will become even more common with TCP FastOpen where it's very
convenient to send SYN+request+FIN in a single packet and receive the
response along with the SYN-ACK and a FIN as well. Almost UDP!

 If we want to use a registered status code, I would argue in favor of
 417 which has the following in it's description:
 if the server is a proxy, the server has unambiguous evidence that the
 request could not be met by the next-hop server

I thought about it, but 417 has a very specific meaning related to the
Expect header and will definitely have side effects when sent in response
to a request having an Expect header (such as the client not being able
to reformulate the request ever).

 Would it be difficult to add a parameter to the option? Such as option
 httpclose 417 to control how haproxy responds?

It's an idea that is worth thinking about. It would be for abortonclose
though, but the idea is there. One thing I don't like with overloading
options is that in the past they were plain booleans and were cancellable
with no in front of them. As time goes, some options take extra args
which make them harder to cancel, or more confusing.

Maybe we could do something even better, consisting in enumerating the
long list of all events which lead to a fixed status code and make it
possible to remap them. I'm pretty sure that for some users it might be
really nice, even if it allows some to write stupid configurations.

Regards,
Willy




Re: haproxy intermittently not connecting to backend

2014-04-11 Thread Patrick Hemmer
This just keeps coming back to bug me. I don't think the client closing
the connection should result in a 5XX code. 5XX should indicate a server
issue, and the client closing the connection before the server has a
chance to respond isn't a server issue. Only if the server doesn't
respond within the configured timeout should it be a 5XX.

Nginx uses 499 for client closed connection. Perhaps haproxy could use
that status code as well when `option abortonclose` is used.

-Patrick



*From: *Patrick Hemmer hapr...@stormcloud9.net
*Sent: * 2014-04-02 15:50:22 E
*To: *haproxy@formilux.org
*Subject: *Re: haproxy intermittently not connecting to backend

 That makes perfect sense. Thank you very much.

 -Patrick

 
 *From: *Willy Tarreau w...@1wt.eu
 *Sent: * 2014-04-02 15:38:04 E
 *To: *Patrick Hemmer hapr...@stormcloud9.net
 *CC: *haproxy@formilux.org
 *Subject: *Re: haproxy intermittently not connecting to backend

 Hi Patrick,

 On Tue, Apr 01, 2014 at 03:20:15PM -0400, Patrick Hemmer wrote:
 We have an issue with haproxy (1.5-dev22-1a34d57) where it is
 intermittently not connecting to the backend server. However the
 behavior it is exhibiting seems strange.
 The reason I say strange is that in one example, it logged that the
 client disconnected after ~49 seconds with a connection flags of CC--.
 However our config has timeout connect 5000, so it should have timed
 out connecting to the backend server after 5 seconds. Additionally we
 have retries 3 in the config, so upon timing out, it should have tried
 another backend server, but it never did (the retries counter in the log
 shows 0).
 No, retries impacts only retries to the same server, it's option redispatch
 which allows the last retry to be performed on another server. But you have
 it anyway.

 At the time of this log entry, the backend server is responding
 properly. For the ~49 seconds prior to the log entry, the backend server
 has taken other requests. The backend server is also another haproxy
 (same version).

 Here's an example of one such log entry:
 [fixed version pasted here]

 198.228.211.13:60848 api~ platform-push/i-84d931a5 49562/0/-1/-1/49563 
 0/0/0/0/0 0/0 691/212 503 CC-- 4F8E-4624 + GET 
 /1/sync/notifications/subscribe?sync_box_id=12496sender=D7A9F93D-F653-4527-A022-383AD55A1943
  HTTP/1.1
 OK in fact the client did not wait 49 seconds. If you look closer, you'll
 see that the client remained silent for 49 seconds (typically a connection
 pool or a preconnect) and closed immediately after sending the request (in
 the same millisecond). Since you have option abortonclose, the connection
 was aborted before the server had a chance to respond.

 So I can easily imagine that you randomly get this error, you're in a race
 condition, if the server responds immediately, you win the race and the
 request is handled, otherwise it's aborted.

 Please start by removing option abortonclose, I think it will fix the 
 issue.
 Second thing you can do is to remove option httpclose or replace it with
 option http-server-close which is active and not just passive. The 
 connections
 will last less time on your servers which is always appreciated.

 I'm not seeing any other issue, so with just this you should be fine.

 The log format is defined as:
 %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\
 %U/%B\ %ST\ %tsc\ %ID\ +\ %r

 Running a show errors on the stats socket did not return any relevant
 results.

 Here's the relevant portions of the haproxy config. It is not the entire
 thing as the whole config is 1,513 lines long.

 global
   log 127.0.0.1 local0
   maxconn 20480
   user haproxy
   group haproxy
   daemon
   stats socket /var/run/hapi/haproxy/haproxy.sock level admin

 defaults
   log global
   mode http
   option httplog
   option dontlognull
   option log-separate-errors
   retries 3
   option redispatch
   timeout connect 5000
   timeout client 6
   timeout server 17
   option clitcpka
   option srvtcpka
   option abortonclose
   option splice-auto
   monitor-uri /haproxy/ping
   stats enable
   stats uri /haproxy/stats
   stats refresh 15
   stats auth user:pass

 frontend api
   bind *:80
   bind *:443 ssl crt /etc/haproxy/server.pem
   maxconn 2
   option httpclose
   option forwardfor
   acl internal src 10.0.0.0/8
   acl have_request_id req.fhdr(X-Request-Id) -m found
   http-request set-nice -100 if internal
   http-request add-header X-API-URL %[path] if !internal
   http-request add-header X-Request-Timestamp %Ts.%ms
   http-request add-header X-Request-Id %[req.fhdr(X-Request-Id)] if
 internal have_request_id
   http-request set-header X-Request-Id %{+X}o%pid-%rt if !internal ||
 !have_request_id
   http-request add-header X-API-Host i-4a3b1c6a
   unique-id-format %{+X}o%pid-%rt
   log-format %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\
 %ac/%fc/%bc

Re: haproxy intermittently not connecting to backend

2014-04-02 Thread Cyril Bonté

Hi Patrick,

Le 01/04/2014 21:52, Patrick Hemmer a écrit :

Apologies, my mail client went stupid. Here's the log entry unmangled:

198.228.211.13:60848 api~ platform-push/i-84d931a5 49562/0/-1/-1/49563
0/0/0/0/0 0/0 691/212 503 CC-- 4F8E-4624 + GET
/1/sync/notifications/subscribe?sync_box_id=12496sender=D7A9F93D-F653-4527-A022-383AD55A1943
HTTP/1.1


This line looks really weird : actconn, feconn, beconn and srv_conn are 
all equal to 0.


Is it something you can reproduce easily ? and can you test with the 
current snapshot ?


Is there a huge traffic on the server where option httpclose could 
involve port exhaustion ?
Also can you provide your kernel version (to check if splicing is not 
buggy) ?





-Patrick



*From: *Patrick Hemmer hapr...@stormcloud9.net
*Sent: * 2014-04-01 15:20:15 E
*To: *haproxy@formilux.org
*Subject: *haproxy intermittently not connecting to backend


We have an issue with haproxy (1.5-dev22-1a34d57) where it is
intermittently not connecting to the backend server. However the
behavior it is exhibiting seems strange.
The reason I say strange is that in one example, it logged that the
client disconnected after ~49 seconds with a connection flags of
CC--. However our config has timeout connect 5000, so it should
have timed out connecting to the backend server after 5 seconds.
Additionally we have retries 3 in the config, so upon timing out, it
should have tried another backend server, but it never did (the
retries counter in the log shows 0).
At the time of this log entry, the backend server is responding
properly. For the ~49 seconds prior to the log entry, the backend
server has taken other requests. The backend server is also another
haproxy (same version).

Here's an example of one such log entry:

  198.228.211.13:60848  api~  platform-push/i-84d931a5  49562/0/-1/-1/49563  
0/0/0/0/0  0/0  691/212

lt;
span class=t style=border-color: rgb(204, 204, 204); font-style: normal; cursor: 
pointer;503CC-- 4F8E-4624  +GET  /1/sync/notifications/subscribe?sync_box_
  id=12345sender=27B9A93C-F473-4385-A662-352AD34A2453  HTTP/1.1

The log format is defined as:
%ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\
%sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r

Running a show errors on the stats socket did not return any
relevant results.

Here's the relevant portions of the haproxy config. It is not the
entire thing as the whole config is 1,513 lines long.

global
  log 127.0.0.1 local0
  maxconn 20480
  user haproxy
  group haproxy
  daemon
  stats socket /var/run/hapi/haproxy/haproxy.sock level admin

defaults
  log global
  mode http
  option httplog
  option dontlognull
  option log-separate-errors
  retries 3
  option redispatch
  timeout connect 5000
  timeout client 6
  timeout server 17
  option clitcpka
  option srvtcpka
  option abortonclose
  option splice-auto
  monitor-uri /haproxy/ping
  stats enable
  stats uri /haproxy/stats
  stats refresh 15
  stats auth user:pass

frontend api
  bind *:80
  bind *:443 ssl crt /etc/haproxy/server.pem
  maxconn 2
  option httpclose
  option forwardfor
  acl internal src 10.0.0.0/8
  acl have_request_id req.fhdr(X-Request-Id) -m found
  http-request set-nice -100 if internal
  http-request add-header X-API-URL %[path] if !internal
  http-request add-header X-Request-Timestamp %Ts.%ms
  http-request add-header X-Request-Id %[req.fhdr(X-Request-Id)] if
internal have_request_id
  http-request set-header X-Request-Id %{+X}o%pid-%rt if !internal ||
!have_request_id
  http-request add-header X-API-Host i-4a3b1c6a
  unique-id-format %{+X}o%pid-%rt
  log-format %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\
%ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r
  default_backend DEFAULT_404

  acl rewrite-found req.hdr(X-Rewrite-ID,1) -m found

  acl nqXn_path path_reg ^/1/sync/notifications/subscribe/([^\ ?]*)$
  acl nqXn_method method OPTIONS GET HEAD POST PUT DELETE TRACE
CONNECT PATCH
  http-request set-header X-Rewrite-Id nqXn if !rewrite-found
nqXn_path nqXn_method
  acl rewrite-nqXn req.hdr(X-Rewrite-Id) -m str nqXn
  use_backend platform-push if rewrite-nqXn
  reqrep ^(OPTIONS|GET|HEAD|POST|PUT|DELETE|TRACE|CONNECT|PATCH)\
/1/sync/notifications/subscribe/([^\ ?]*)([\ ?].*|$) \1\
/1/sync/subscribe/\2\3 if rewrite-nqXn


backend platform-push
  option httpchk GET /ping
  default-server inter 15s fastinter 1s
  server i-6eaf724d 10.230.23.64:80 check observe layer4
  server i-84d931a5 10.230.42.8:80 check observe layer4






--
Cyril Bonté



Re: haproxy intermittently not connecting to backend

2014-04-02 Thread Willy Tarreau
Hi Patrick,

On Tue, Apr 01, 2014 at 03:20:15PM -0400, Patrick Hemmer wrote:
 We have an issue with haproxy (1.5-dev22-1a34d57) where it is
 intermittently not connecting to the backend server. However the
 behavior it is exhibiting seems strange.
 The reason I say strange is that in one example, it logged that the
 client disconnected after ~49 seconds with a connection flags of CC--.
 However our config has timeout connect 5000, so it should have timed
 out connecting to the backend server after 5 seconds. Additionally we
 have retries 3 in the config, so upon timing out, it should have tried
 another backend server, but it never did (the retries counter in the log
 shows 0).

No, retries impacts only retries to the same server, it's option redispatch
which allows the last retry to be performed on another server. But you have
it anyway.

 At the time of this log entry, the backend server is responding
 properly. For the ~49 seconds prior to the log entry, the backend server
 has taken other requests. The backend server is also another haproxy
 (same version).
 
 Here's an example of one such log entry:

[fixed version pasted here]

 198.228.211.13:60848 api~ platform-push/i-84d931a5 49562/0/-1/-1/49563 
 0/0/0/0/0 0/0 691/212 503 CC-- 4F8E-4624 + GET 
 /1/sync/notifications/subscribe?sync_box_id=12496sender=D7A9F93D-F653-4527-A022-383AD55A1943
  HTTP/1.1

OK in fact the client did not wait 49 seconds. If you look closer, you'll
see that the client remained silent for 49 seconds (typically a connection
pool or a preconnect) and closed immediately after sending the request (in
the same millisecond). Since you have option abortonclose, the connection
was aborted before the server had a chance to respond.

So I can easily imagine that you randomly get this error, you're in a race
condition, if the server responds immediately, you win the race and the
request is handled, otherwise it's aborted.

Please start by removing option abortonclose, I think it will fix the issue.
Second thing you can do is to remove option httpclose or replace it with
option http-server-close which is active and not just passive. The connections
will last less time on your servers which is always appreciated.

I'm not seeing any other issue, so with just this you should be fine.

 The log format is defined as:
 %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\
 %U/%B\ %ST\ %tsc\ %ID\ +\ %r
 
 Running a show errors on the stats socket did not return any relevant
 results.
 
 Here's the relevant portions of the haproxy config. It is not the entire
 thing as the whole config is 1,513 lines long.
 
 global
   log 127.0.0.1 local0
   maxconn 20480
   user haproxy
   group haproxy
   daemon
   stats socket /var/run/hapi/haproxy/haproxy.sock level admin
 
 defaults
   log global
   mode http
   option httplog
   option dontlognull
   option log-separate-errors
   retries 3
   option redispatch
   timeout connect 5000
   timeout client 6
   timeout server 17
   option clitcpka
   option srvtcpka
   option abortonclose
   option splice-auto
   monitor-uri /haproxy/ping
   stats enable
   stats uri /haproxy/stats
   stats refresh 15
   stats auth user:pass
 
 frontend api
   bind *:80
   bind *:443 ssl crt /etc/haproxy/server.pem
   maxconn 2
   option httpclose
   option forwardfor
   acl internal src 10.0.0.0/8
   acl have_request_id req.fhdr(X-Request-Id) -m found
   http-request set-nice -100 if internal
   http-request add-header X-API-URL %[path] if !internal
   http-request add-header X-Request-Timestamp %Ts.%ms
   http-request add-header X-Request-Id %[req.fhdr(X-Request-Id)] if
 internal have_request_id
   http-request set-header X-Request-Id %{+X}o%pid-%rt if !internal ||
 !have_request_id
   http-request add-header X-API-Host i-4a3b1c6a
   unique-id-format %{+X}o%pid-%rt
   log-format %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\
 %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r
   default_backend DEFAULT_404
 
   acl rewrite-found req.hdr(X-Rewrite-ID,1) -m found
 
   acl nqXn_path path_reg ^/1/sync/notifications/subscribe/([^\ ?]*)$
   acl nqXn_method method OPTIONS GET HEAD POST PUT DELETE TRACE CONNECT
 PATCH
   http-request set-header X-Rewrite-Id nqXn if !rewrite-found nqXn_path
 nqXn_method
   acl rewrite-nqXn req.hdr(X-Rewrite-Id) -m str nqXn
   use_backend platform-push if rewrite-nqXn
   reqrep ^(OPTIONS|GET|HEAD|POST|PUT|DELETE|TRACE|CONNECT|PATCH)\
 /1/sync/notifications/subscribe/([^\ ?]*)([\ ?].*|$) \1\
 /1/sync/subscribe/\2\3 if rewrite-nqXn
 
 
 backend platform-push
   option httpchk GET /ping
   default-server inter 15s fastinter 1s
   server i-6eaf724d 10.230.23.64:80 check observe layer4
   server i-84d931a5 10.230.42.8:80 check observe layer4
 

Regards,
Willy




Re: haproxy intermittently not connecting to backend

2014-04-02 Thread Patrick Hemmer
That makes perfect sense. Thank you very much.

-Patrick


*From: *Willy Tarreau w...@1wt.eu
*Sent: * 2014-04-02 15:38:04 E
*To: *Patrick Hemmer hapr...@stormcloud9.net
*CC: *haproxy@formilux.org
*Subject: *Re: haproxy intermittently not connecting to backend

 Hi Patrick,

 On Tue, Apr 01, 2014 at 03:20:15PM -0400, Patrick Hemmer wrote:
 We have an issue with haproxy (1.5-dev22-1a34d57) where it is
 intermittently not connecting to the backend server. However the
 behavior it is exhibiting seems strange.
 The reason I say strange is that in one example, it logged that the
 client disconnected after ~49 seconds with a connection flags of CC--.
 However our config has timeout connect 5000, so it should have timed
 out connecting to the backend server after 5 seconds. Additionally we
 have retries 3 in the config, so upon timing out, it should have tried
 another backend server, but it never did (the retries counter in the log
 shows 0).
 No, retries impacts only retries to the same server, it's option redispatch
 which allows the last retry to be performed on another server. But you have
 it anyway.

 At the time of this log entry, the backend server is responding
 properly. For the ~49 seconds prior to the log entry, the backend server
 has taken other requests. The backend server is also another haproxy
 (same version).

 Here's an example of one such log entry:
 [fixed version pasted here]

 198.228.211.13:60848 api~ platform-push/i-84d931a5 49562/0/-1/-1/49563 
 0/0/0/0/0 0/0 691/212 503 CC-- 4F8E-4624 + GET 
 /1/sync/notifications/subscribe?sync_box_id=12496sender=D7A9F93D-F653-4527-A022-383AD55A1943
  HTTP/1.1
 OK in fact the client did not wait 49 seconds. If you look closer, you'll
 see that the client remained silent for 49 seconds (typically a connection
 pool or a preconnect) and closed immediately after sending the request (in
 the same millisecond). Since you have option abortonclose, the connection
 was aborted before the server had a chance to respond.

 So I can easily imagine that you randomly get this error, you're in a race
 condition, if the server responds immediately, you win the race and the
 request is handled, otherwise it's aborted.

 Please start by removing option abortonclose, I think it will fix the issue.
 Second thing you can do is to remove option httpclose or replace it with
 option http-server-close which is active and not just passive. The 
 connections
 will last less time on your servers which is always appreciated.

 I'm not seeing any other issue, so with just this you should be fine.

 The log format is defined as:
 %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\
 %U/%B\ %ST\ %tsc\ %ID\ +\ %r

 Running a show errors on the stats socket did not return any relevant
 results.

 Here's the relevant portions of the haproxy config. It is not the entire
 thing as the whole config is 1,513 lines long.

 global
   log 127.0.0.1 local0
   maxconn 20480
   user haproxy
   group haproxy
   daemon
   stats socket /var/run/hapi/haproxy/haproxy.sock level admin

 defaults
   log global
   mode http
   option httplog
   option dontlognull
   option log-separate-errors
   retries 3
   option redispatch
   timeout connect 5000
   timeout client 6
   timeout server 17
   option clitcpka
   option srvtcpka
   option abortonclose
   option splice-auto
   monitor-uri /haproxy/ping
   stats enable
   stats uri /haproxy/stats
   stats refresh 15
   stats auth user:pass

 frontend api
   bind *:80
   bind *:443 ssl crt /etc/haproxy/server.pem
   maxconn 2
   option httpclose
   option forwardfor
   acl internal src 10.0.0.0/8
   acl have_request_id req.fhdr(X-Request-Id) -m found
   http-request set-nice -100 if internal
   http-request add-header X-API-URL %[path] if !internal
   http-request add-header X-Request-Timestamp %Ts.%ms
   http-request add-header X-Request-Id %[req.fhdr(X-Request-Id)] if
 internal have_request_id
   http-request set-header X-Request-Id %{+X}o%pid-%rt if !internal ||
 !have_request_id
   http-request add-header X-API-Host i-4a3b1c6a
   unique-id-format %{+X}o%pid-%rt
   log-format %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\
 %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r
   default_backend DEFAULT_404

   acl rewrite-found req.hdr(X-Rewrite-ID,1) -m found

   acl nqXn_path path_reg ^/1/sync/notifications/subscribe/([^\ ?]*)$
   acl nqXn_method method OPTIONS GET HEAD POST PUT DELETE TRACE CONNECT
 PATCH
   http-request set-header X-Rewrite-Id nqXn if !rewrite-found nqXn_path
 nqXn_method
   acl rewrite-nqXn req.hdr(X-Rewrite-Id) -m str nqXn
   use_backend platform-push if rewrite-nqXn
   reqrep ^(OPTIONS|GET|HEAD|POST|PUT|DELETE|TRACE|CONNECT|PATCH)\
 /1/sync/notifications/subscribe/([^\ ?]*)([\ ?].*|$) \1\
 /1/sync/subscribe/\2\3 if rewrite-nqXn


 backend platform-push
   option httpchk GET /ping
   default-server inter 15s fastinter 1s

Re: haproxy intermittently not connecting to backend

2014-04-01 Thread Patrick Hemmer
Apologies, my mail client went stupid. Here's the log entry unmangled:

198.228.211.13:60848 api~ platform-push/i-84d931a5 49562/0/-1/-1/49563
0/0/0/0/0 0/0 691/212 503 CC-- 4F8E-4624 + GET
/1/sync/notifications/subscribe?sync_box_id=12496sender=D7A9F93D-F653-4527-A022-383AD55A1943
HTTP/1.1

-Patrick



*From: *Patrick Hemmer hapr...@stormcloud9.net
*Sent: * 2014-04-01 15:20:15 E
*To: *haproxy@formilux.org
*Subject: *haproxy intermittently not connecting to backend

 We have an issue with haproxy (1.5-dev22-1a34d57) where it is
 intermittently not connecting to the backend server. However the
 behavior it is exhibiting seems strange.
 The reason I say strange is that in one example, it logged that the
 client disconnected after ~49 seconds with a connection flags of
 CC--. However our config has timeout connect 5000, so it should
 have timed out connecting to the backend server after 5 seconds.
 Additionally we have retries 3 in the config, so upon timing out, it
 should have tried another backend server, but it never did (the
 retries counter in the log shows 0).
 At the time of this log entry, the backend server is responding
 properly. For the ~49 seconds prior to the log entry, the backend
 server has taken other requests. The backend server is also another
 haproxy (same version).

 Here's an example of one such log entry:

  198.228.211.13:60848 api~ platform-push/i-84d931a5 49562/0/-1/-1/49563 
 0/0/0/0/0 0/0 691/212
   
 lt;
 span class=t style=border-color: rgb(204, 204, 204); font-style: normal; 
 cursor: pointer;503 CC-- 4F8E-4624 + GET 
 /1/sync/notifications/subscribe?sync_box_
  id=12345sender=27B9A93C-F473-4385-A662-352AD34A2453 HTTP/1.1

 The log format is defined as:
 %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ac/%fc/%bc/%sc/%rc\
 %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r

 Running a show errors on the stats socket did not return any
 relevant results.

 Here's the relevant portions of the haproxy config. It is not the
 entire thing as the whole config is 1,513 lines long.

 global
   log 127.0.0.1 local0
   maxconn 20480
   user haproxy
   group haproxy
   daemon
   stats socket /var/run/hapi/haproxy/haproxy.sock level admin

 defaults
   log global
   mode http
   option httplog
   option dontlognull
   option log-separate-errors
   retries 3
   option redispatch
   timeout connect 5000
   timeout client 6
   timeout server 17
   option clitcpka
   option srvtcpka
   option abortonclose
   option splice-auto
   monitor-uri /haproxy/ping
   stats enable
   stats uri /haproxy/stats
   stats refresh 15
   stats auth user:pass

 frontend api
   bind *:80
   bind *:443 ssl crt /etc/haproxy/server.pem
   maxconn 2
   option httpclose
   option forwardfor
   acl internal src 10.0.0.0/8
   acl have_request_id req.fhdr(X-Request-Id) -m found
   http-request set-nice -100 if internal
   http-request add-header X-API-URL %[path] if !internal
   http-request add-header X-Request-Timestamp %Ts.%ms
   http-request add-header X-Request-Id %[req.fhdr(X-Request-Id)] if
 internal have_request_id
   http-request set-header X-Request-Id %{+X}o%pid-%rt if !internal ||
 !have_request_id
   http-request add-header X-API-Host i-4a3b1c6a
   unique-id-format %{+X}o%pid-%rt
   log-format %ci:%cp\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\
 %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U/%B\ %ST\ %tsc\ %ID\ +\ %r
   default_backend DEFAULT_404

   acl rewrite-found req.hdr(X-Rewrite-ID,1) -m found

   acl nqXn_path path_reg ^/1/sync/notifications/subscribe/([^\ ?]*)$
   acl nqXn_method method OPTIONS GET HEAD POST PUT DELETE TRACE
 CONNECT PATCH
   http-request set-header X-Rewrite-Id nqXn if !rewrite-found
 nqXn_path nqXn_method
   acl rewrite-nqXn req.hdr(X-Rewrite-Id) -m str nqXn
   use_backend platform-push if rewrite-nqXn
   reqrep ^(OPTIONS|GET|HEAD|POST|PUT|DELETE|TRACE|CONNECT|PATCH)\
 /1/sync/notifications/subscribe/([^\ ?]*)([\ ?].*|$) \1\
 /1/sync/subscribe/\2\3 if rewrite-nqXn


 backend platform-push
   option httpchk GET /ping
   default-server inter 15s fastinter 1s
   server i-6eaf724d 10.230.23.64:80 check observe layer4
   server i-84d931a5 10.230.42.8:80 check observe layer4