Re: Backend redirection issue - log interpretation help please

2012-07-27 Thread Richard Stanford
Cyril, thanks for your response.

On Jul 26, 2012, at 4:46 PM, Cyril Bonté wrote:

 
 Please add log global in your backend sections (or in your defaults),
 this explains why your log files didn't give you any indication.
 After adding this line everywhere (not only in the frontend), you'll see when 
 servers go UP and DOWN, and why.
 This will also probably help us to know what happens.

Done.  That helps - we tend to be a bit minimal on most logging for compliance 
reasons but that's certainly going to help.

 From this previous log line, it looks like something becomes slow (your 
 haproxy server or your backend servers).

That's the odd piece, because the server logs don't seem to indicate any issues 
(and this was last seen shortly after a restart).  Hopefully the logging will 
shed more light on the subject.

 Wow, are you sure you really want to use such a big buffer size ? Also, 
 ensure you're running the last stable version of HAProxy (currently 1.4.21), 
 which fixes a major bug when using a larger buffer size (it doesn't explain 
 what you observe but it's an advise for more stability).

Unfortunately yes; we are supporting some rare but critical very large HTTP GET 
requests over a REST API.  I'll look into upgrading shortly.

 For more details :
 http://haproxy.1wt.eu/git?p=haproxy-1.4.git;a=commit;h=30297cb17147a8d339eb160226bcc08c91d9530b

Good to know!

 As said at the beginning, please add :
   log global

Added.  Is there a useful log level configuration that outputs exactly what the 
defaults are without the stats connection lines?

 If using cookies is not an issue for your clients, I'd recommend you not to 
 use appsession but cookie insert or cookie prefix instead.
 
 http://cbonte.github.com/haproxy-dconv/configuration-1.4.html#4-cookie

We'll probably leave this one just because I don't believe its giving us any 
issues at the moment; the client problem definitely coincided with the backend 
being marked as DOWN rather than a cookie issue (the cookie issue was my first 
guess to be honest).

 server gui1 172.25.200.53:8080 check maxconn 2000
 server gui2 172.25.200.54:8080 check maxconn 2000
 
 You didn't provide any timeout check nor inter value.
 The default will be 2 seconds, which is maybe too low for your case.

It shouldn't be - our healthcheck page is fairly simple and just basically 
makes sure that our webapp is responding to requests (barely more than a static 
file) - I've upped timeout check to 1 though, so we'll see if that makes 
a difference.

 Hope this helps.

It did, and thank you for looking at this.  I've learned an awful lot about 
haproxy configuration setups (good and bad) from this list!

-Richard



Backend redirection issue - log interpretation help please

2012-07-26 Thread Richard Stanford
I'm having some trouble interpreting an HAProxy log file line - it doesn't seem 
to make much sense to me, so I'm fairly certain that I'm reading it 
incorrectly.  My full configuration file is at the end of this email.  
Symptomatically, I've been seeing situations in which people are briefly 
redirected unexpectedly onto a different 'gui' backend server.  Its rare, but 
when it happens it often appears at the same point in the GUI use flow.  This 
may be a red herring however.

This morning I was able to reproduce it even after restarting our appserver 
proceses and our haproxy process (which had been up for an extended period of 
time - thanks!).  When it reproduced, I noticed that the backend appserver was 
marked as DOWN in the statistics report even though the logfiles gave no 
indication that there were any problems. From the user's experience, the system 
hung waiting for a response for some 40 seconds and then showed a login 
screen, which was served when the incorrect backend received a page request.  
Simply re-requesting the previous page was successful, as the previously 
authenticated backend was shown as UP by that point.

In the logfile we see the following entry at the appropriate time:

Jul 26 06:21:09 localhost haproxy[3853]: 172.25.200.177:3124 
[26/Jul/2012:06:21:01.826] secure gui/gui2 4993/0/0/-1/7992 -1 0 - - CH-- 
40/40/1/1/0 0/0 HEAD /panel/healthcheck.html HTTP/1.1

This file - while similar to the HAProxy healthcheck URL (and in fact produced 
the same way on the backend server) - is actually the one being requested by 
our hardware LBs (we use existing hardware LBs to distribute between machines, 
and a single haproxy daemon on each box to distribute to appservers within each 
machine).  It appears that it saw some sort of timeout, but I'm not sure why 
that would have caused haproxy to mark the backend as DOWN.  We add backend 
server markers to our generated pages, so I'm confident that the hardware LB 
kept us on the same physical system the entire time.

Needless to say I'm a little confused as to what I'm seeing.  If anyone can 
shed a little more light on the subject it would be very much appreciated!  
Other than this recurrent issue, everything's been working incredibly well.

Configuration file below in full in case there's something odd:

global
maxconn 2
tune.bufsize 128000
tune.maxrewrite 8192
ulimit-n 6
stats socket /tmp/haproxy mode 0777 level admin
log 127.0.0.1 local0

defaults
mode http
timeout connect 5s
timeout client 1m
timeout server 10m
option srvtcpka
option http-server-close
option http-pretend-keepalive
option redispatch
option abortonclose
option httpchk HEAD /panel/healthcheck.txt HTTP/1.0

backend gui
balance source
appsession JSESSIONID len 64 timeout 3h
server gui1 172.25.200.53:8080 check maxconn 2000
server gui2 172.25.200.54:8080 check maxconn 2000

backend platform
balance roundrobin
server platform1 172.25.200.50:8080 check maxconn 2000 
server platform2 172.25.200.51:8080 check maxconn 2000
server platform3 172.25.200.52:8080 check maxconn 2000

frontend secure
bind 172.25.200.70:8080
maxconn 2
option forwardfor
option clitcpka
acl javascript url_beg   /js
acl widgetsurl_beg   /widgets
acl reportsurl_beg   /datafeed
acl deny   url_beg   /server-status /balancer-manager /jmx-console 
/web-console /invoker
block if deny
default_backend gui
use_backend platform if javascript or widgets or reports
log global
option httplog
option dontlognull
option dontlog-normal

listen stats :7583
mode http
stats uri /






Re: Backend redirection issue - log interpretation help please

2012-07-26 Thread Cyril Bonté

Hi Richard,

Le 26/07/2012 13:46, Richard Stanford a écrit :

This morning I was able to reproduce it even after restarting our
appserver proceses and our haproxy process (which had been up for an
extended period of time - thanks!).  When it reproduced, I noticed that
the backend appserver was marked as DOWN in the statistics report even
though the logfiles gave no indication that there were any problems.


Please add log global in your backend sections (or in your defaults),
this explains why your log files didn't give you any indication.
After adding this line everywhere (not only in the frontend), you'll see 
when servers go UP and DOWN, and why.

This will also probably help us to know what happens.


 From the user's experience, the system hung waiting for a response
for some 40 seconds and then showed a login screen, which was served
when the incorrect backend received a page request.  Simply
re-requesting the previous page was successful, as the previously
authenticated backend was shown as UP by that point.

In the logfile we see the following entry at the appropriate time:

Jul 26 06:21:09 localhost haproxy[3853]: 172.25.200.177:3124
[26/Jul/2012:06:21:01.826] secure gui/gui2 4993/0/0/-1/7992 -1 0 - -
CH-- 40/40/1/1/0 0/0 HEAD /panel/healthcheck.html HTTP/1.1


From this previous log line, it looks like something becomes slow (your 
haproxy server or your backend servers).



Configuration file below in full in case there's something odd:

global
 maxconn 2
 tune.bufsize 128000


Wow, are you sure you really want to use such a big buffer size ?
Also, ensure you're running the last stable version of HAProxy 
(currently 1.4.21), which fixes a major bug when using a larger buffer 
size (it doesn't explain what you observe but it's an advise for more 
stability).


For more details :
http://haproxy.1wt.eu/git?p=haproxy-1.4.git;a=commit;h=30297cb17147a8d339eb160226bcc08c91d9530b


defaults
 mode http
 timeout connect 5s
 timeout client 1m
 timeout server 10m
 option srvtcpka
 option http-server-close
 option http-pretend-keepalive
 option redispatch
 option abortonclose
 option httpchk HEAD /panel/healthcheck.txt HTTP/1.0


As said at the beginning, please add :
   log global


backend gui
 balance source
 appsession JSESSIONID len 64 timeout 3h


If using cookies is not an issue for your clients, I'd recommend you not 
to use appsession but cookie insert or cookie prefix instead.


http://cbonte.github.com/haproxy-dconv/configuration-1.4.html#4-cookie


 server gui1 172.25.200.53:8080 check maxconn 2000
 server gui2 172.25.200.54:8080 check maxconn 2000


You didn't provide any timeout check nor inter value.
The default will be 2 seconds, which is maybe too low for your case.

http://cbonte.github.com/haproxy-dconv/configuration-1.4.html#inter
http://cbonte.github.com/haproxy-dconv/configuration-1.4.html#timeout%20check

Hope this helps.

--
Cyril Bonté