500s with 1.4.18 and 1.5d7

2011-10-03 Thread Hank A. Paulson

I am not sure if these counts are exceeding the never threshold

   500  when haproxy encounters an unrecoverable internal error, such as a
memory allocation failure, which should never happen

I am not sure what I can do to troubleshoot this since it is in prod :(
Is there a way to set it to core dump and die when it has a 500?

Past few days:
today so far:
 12 -1
 513153 200
137 206
   8051 302
   1277 304
127 400
 22 403
790 404
 35 408
 32 500
  1 503
  7 504

yesterday:
   3456 -1
14697297 200
   4243 206
  1 301
 257865 302
  54130 304
   1579 400
   1002 403
  27800 404
   1438 408
130 416
   1138 500
  5 501
 18 502
140 503
   1788 504

day before:
1.4.18:
514 -1
3221607 200
   1032 206
  55671 302
   3514 304
283 400
165 403
   5691 404
196 408
198 500
329 502
  22603 503
  38185 504

1.5d7:
   3704 -1
12350739 200
   3795 206
 220736 302
  31129 304
   1013 400
   1124 403
  27887 404
   1141 408
 17 416
950 500
 33 502
   1206 503
  39343 504

$ uname -a
Linux filbert 2.6.32.26-175.fc12.x86_64 #1 SMP Wed Dec 1 21:39:34 UTC 2010 
x86_64 x86_64 x86_64 GNU/Linux


$ free -m
 total   used   free sharedbuffers cached
Mem:  4934626   4307  0 63216
-/+ buffers/cache:347   4587
Swap:0  0  0

$ /usr/sbin/haproxy1418 -vv
HA-Proxy version 1.4.18 2011/09/16
Copyright 2000-2011 Willy Tarreau w...@1wt.eu

Build options :
  TARGET  = linux26
  CPU = native
  CC  = gcc
  CFLAGS  = -O2 -march=native -g -fno-strict-aliasing
  OPTIONS = USE_LINUX_SPLICE=1 USE_REGPARM=1 USE_STATIC_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes

Available polling systems :
 sepoll : pref=400,  test result OK
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 4 (4 usable), will use sepoll.




Re: 500s with 1.4.18 and 1.5d7

2011-10-03 Thread Brane F. Gračnar
On Monday 03 of October 2011 20:09:17 Hank A. Paulson wrote:
 I am not sure if these counts are exceeding the never threshold
 
 500  when haproxy encounters an unrecoverable internal error, such as a
  memory allocation failure, which should never happen
 
 I am not sure what I can do to troubleshoot this since it is in prod :(
 Is there a way to set it to core dump and die when it has a 500?

Are you sure, that these are not upstream server 500 errors?

Best regards, Brane



Re: 500s with 1.4.18 and 1.5d7

2011-10-03 Thread Hank A. Paulson

On 10/3/11 12:19 PM, Brane F. Gračnar wrote:

On Monday 03 of October 2011 20:09:17 Hank A. Paulson wrote:

I am not sure if these counts are exceeding the never threshold

 500  when haproxy encounters an unrecoverable internal error, such as a
  memory allocation failure, which should never happen

I am not sure what I can do to troubleshoot this since it is in prod :(
Is there a way to set it to core dump and die when it has a 500?


Are you sure, that these are not upstream server 500 errors?

Best regards, Brane


Good point, I don't know how to differentiate from the haproxy logs which 500s 
originate from haproxy and which are passed through from the backend servers. 
I wish there was an easy way to tell since haproxy 500s are much more 
worrisome. Maybe ai am missing something...




Re: 500s with 1.4.18 and 1.5d7

2011-10-03 Thread Baptiste
On Mon, Oct 3, 2011 at 11:02 PM, Hank A. Paulson
h...@spamproof.nospammail.net wrote:
 On 10/3/11 12:19 PM, Brane F. Gračnar wrote:

 On Monday 03 of October 2011 20:09:17 Hank A. Paulson wrote:

 I am not sure if these counts are exceeding the never threshold

     500  when haproxy encounters an unrecoverable internal error, such as
 a
          memory allocation failure, which should never happen

 I am not sure what I can do to troubleshoot this since it is in prod :(
 Is there a way to set it to core dump and die when it has a 500?

 Are you sure, that these are not upstream server 500 errors?

 Best regards, Brane

 Good point, I don't know how to differentiate from the haproxy logs which
 500s originate from haproxy and which are passed through from the backend
 servers. I wish there was an easy way to tell since haproxy 500s are much
 more worrisome. Maybe ai am missing something...



on your log line, you may have a letter on the second character of the
termination state flags.
If there is a letter, then it means there has been an issue between
HAProxy and your server.

cheers



Re: 500s with 1.4.18 and 1.5d7

2011-10-03 Thread Willy Tarreau
On Mon, Oct 03, 2011 at 11:17:14PM +0200, Baptiste wrote:
 On Mon, Oct 3, 2011 at 11:02 PM, Hank A. Paulson
 h...@spamproof.nospammail.net wrote:
  On 10/3/11 12:19 PM, Brane F. Gra??nar wrote:
 
  On Monday 03 of October 2011 20:09:17 Hank A. Paulson wrote:
 
  I am not sure if these counts are exceeding the never threshold
 
      500  when haproxy encounters an unrecoverable internal error, such as
  a
           memory allocation failure, which should never happen
 
  I am not sure what I can do to troubleshoot this since it is in prod :(
  Is there a way to set it to core dump and die when it has a 500?
 
  Are you sure, that these are not upstream server 500 errors?
 
  Best regards, Brane
 
  Good point, I don't know how to differentiate from the haproxy logs which
  500s originate from haproxy and which are passed through from the backend
  servers. I wish there was an easy way to tell since haproxy 500s are much
  more worrisome. Maybe ai am missing something...
 
 
 
 on your log line, you may have a letter on the second character of the
 termination state flags.
 If there is a letter, then it means there has been an issue between
 HAProxy and your server.

There are very few situations where haproxy may emit a 500 right now :
  - lack of memory during session_accept() : the session will not be
logged, so you will not see it in haproxy's logs ;

  - internal error : the first char of the flags in the log will *always*
be I (for internal error) ;

  - tarpit : if you configured a tarpit and some users are experiencing it,
you will see the flags PT in the logs (P for proxy, T for tarpit) ;

There are also other hints. For instance, if you see that the session timers
are still at -1 for the connect time or the response time, it is guaranteed
that it cannot be the server which reported the response since it has not
responded.

I have planned to add two status codes in the future, one to log what the
server reported, and one to log what haproxy sent to the client. It will
make troubleshooting much easier.

Regards,
Willy