Re: mysql failover and forcing disconnects

2012-05-24 Thread Willy Tarreau
Hi Justin,

On Wed, May 23, 2012 at 03:11:00PM -0700, Justin Karneges wrote:
 (Apologies if this comes through twice. The first time I sent was before 
 subscription approval, and I don't think it went through.)

It was OK, you don't need to be subscribed to post messages.

(...)
 1) Even if haproxy notices within seconds that the mysql master is down, 
 existing connections remain pointed to the master. I set timeout server 5m 
 so that within 5 minutes of inactivity, haproxy will eventually kill the 
 connections, causing clients to reconnect and get routed to the slave. This 
 means that in practice, the failover takes 5 minutes to fully complete. I 
 could reduce this timeout value futher but this does not feel like the ideal 
 solution.

There is an option at the server level which is on-marked-down 
shutdown-session.
It achieves exactly what you want, it will kill all connections to a server
which is detected as down.

 2) If the master eventually comes back, all connections that ended up routing 
 to the slave will stay on the slave indefinitely. The only solution I have 
 for 
 this is to restart mysql on the slave, which kicks everyone off causing them 
 to 
 reconnect and get routed back to the master. This is acceptable if restoring 
 master required some kind of manual maintenance, since I'd already be getting 
 my hands dirty anyway. However, if master disappears and comes back due to 
 brief network outage that resolves itself automatically, it's unfortunate 
 that 
 I'd still have to manually react to this by kicking everyone off the slave.

There is no universal solution for this. As haproxy doesn't inspect the mysql
traffic, it cannot know when a connection remains idle and unused. Making it
arbitrarily kill connections to a working server would be the worst thing to
do, as it would kill connections on which a transaction is waiting for being
completed.

I really think the best you can do is to have your slave declared as a backup
server and use short enough timeouts so that idle connections expire quickly
and are replaced by new connections to the master server.

Regards,
Willy




Re: mysql failover and forcing disconnects

2012-05-24 Thread Justin Karneges
On Wednesday, May 23, 2012 11:57:14 PM Willy Tarreau wrote:
 There is an option at the server level which is on-marked-down
 shutdown-session. It achieves exactly what you want, it will kill all
 connections to a server which is detected as down.

Perfect!

  2) If the master eventually comes back, all connections that ended up
  routing to the slave will stay on the slave indefinitely. The only
  solution I have for this is to restart mysql on the slave, which kicks
  everyone off causing them to reconnect and get routed back to the
  master. This is acceptable if restoring master required some kind of
  manual maintenance, since I'd already be getting my hands dirty anyway.
  However, if master disappears and comes back due to brief network outage
  that resolves itself automatically, it's unfortunate that I'd still have
  to manually react to this by kicking everyone off the slave.
 
 There is no universal solution for this. As haproxy doesn't inspect the
 mysql traffic, it cannot know when a connection remains idle and unused.
 Making it arbitrarily kill connections to a working server would be the
 worst thing to do, as it would kill connections on which a transaction is
 waiting for being completed.

Well, the network could fail at anytime and have a similar effect. I'm not sure 
if killing all connections to the backup is really any worse than killing all 
connections to the non-backup (via on-marked-down). Either way a bunch of 
client errors may occur, but for a scenario that is hopefully rare.

Maybe an on-marked-up shutdown-backup-sessions option would be good.

Justin



Re: mysql failover and forcing disconnects

2012-05-24 Thread Willy Tarreau
On Thu, May 24, 2012 at 01:12:14AM -0700, Justin Karneges wrote:
 Well, the network could fail at anytime and have a similar effect. I'm not 
 sure 
 if killing all connections to the backup is really any worse than killing all 
 connections to the non-backup (via on-marked-down). Either way a bunch of 
 client errors may occur, but for a scenario that is hopefully rare.

Killing connections when something fails is acceptable to many people,
but killing connections when everything goes well is generally not accepted.

 Maybe an on-marked-up shutdown-backup-sessions option would be good.

I was thinking about something like this, but I still have doubts about
its real usefulness. I don't know what others think here. If there is
real demand for this and people think it serves a real purpose, I'm fine
with accepting a patch to implement it.

Willy




Re: mysql failover and forcing disconnects

2012-05-24 Thread Baptiste
On Thu, May 24, 2012 at 10:59 AM, Willy Tarreau w...@1wt.eu wrote:
 On Thu, May 24, 2012 at 01:12:14AM -0700, Justin Karneges wrote:
 Well, the network could fail at anytime and have a similar effect. I'm not 
 sure
 if killing all connections to the backup is really any worse than killing all
 connections to the non-backup (via on-marked-down). Either way a bunch of
 client errors may occur, but for a scenario that is hopefully rare.

 Killing connections when something fails is acceptable to many people,
 but killing connections when everything goes well is generally not accepted.

 Maybe an on-marked-up shutdown-backup-sessions option would be good.

 I was thinking about something like this, but I still have doubts about
 its real usefulness. I don't know what others think here. If there is
 real demand for this and people think it serves a real purpose, I'm fine
 with accepting a patch to implement it.

 Willy



It's like the preempt in VRRP and it may make sense for any protocol
relying on long connections, like HTTP tunnel mode, rdp, IMAP/POP,
etc...

To me it makes sense :)

cheers



Re: [ANNOUNCE] haproxy 1.4.21

2012-05-24 Thread Kevin Decherf
Hi,

Just for archive: CVE-2012-2391
http://www.openwall.com/lists/oss-security/2012/05/23/15


Kevin Decherf - M: +33 681194547 - T: @Kdecherf


On Tue, May 22, 2012 at 9:30 PM, Vivek Malik vivek.ma...@gmail.com wrote:

 A recommended upgrade for all production users. While we are not
 (generally) affected by the bugs fixed in haproxy stable version. I
 recommend updating haproxy.

 I can update haproxy bin in puppet and can check it in (we distribute
 haproxy binary via puppetmaster).

 Aiman,

 Please update puppetmaster when you see fit and also in general, please
 ensure that puppet client is running on all machines.

 Thanks,
 Vivek


 On Mon, May 21, 2012 at 1:43 AM, Willy Tarreau w...@1wt.eu wrote:

 Hi all,

 a number of old bugs were reported recently. Some of them are quite
 problematic because they can lead to crashes while parsing configuration
 or when starting up, which is even worse considering that startup scripts
 will generally not notice it.

 Among the bugs fixed in 1.4.21, we can enumerate :
  - risk of crash if using reqrep/rsprep and having tune.bufsize manually
configured larger than what was compiled in. The cause is the trash
buffer used for the replace was still static, and I believed this was
fixed months ago but only my mailbox had the fix! Thanks to Dmitry
Sivachenko for reporting this bug.

  - risk of crash when using header captures on a TCP frontend. This is a
configuration issue, and this situation is now correctly detected and
reported. Thanks to Olufemi Omojola for reporting this bug.

  - risk of crash when some servers are declared with checks in a farm
 which
does not use an LB algorithm (eg: option transparent or dispatch).
This happens when a server state is updated and reported to the non-
existing LB algorithm. Fortunately, this happens at start-up when
reporting the servers either up or down, but still it's after the fork
and too late for being easily recovered from by scripts. Thanks to
 David
Touzeau for reporting this bug.

  - balance source did not correctly hash IPv6 addresses, so IPv4
connections to IPv6 listeners would always get the same result. Thanks
to Alex Markham for reporting this bug.

  - the connect timeout was not properly reset upon connection
 establishment,
resulting in a retry if the timeout struck exactly at the same
 millisecond
the connect succeeded. The effect is that if a request was sent as
 part of
the connect hanshake, it is not available for resend during the retry
 and
a response timeout is reported for the server. Note that in practice,
 this
only happens with erroneous configurations. Thanks to Yehuda Sadeh for
reporting this bug.

  - the error captures were wrong if the buffer wrapped, which happens when
capturing incorrectly encoded chunked responses.

 I also backported Cyril's work on the stats page to allow POST params to
 be
 posted in any order, because I know there are people who script actions on
 this page.

 This release also includes doc cleanups from Cyril, Dmitry Sivachenko and
 Adrian Bridgett.

 Distro packagers will be happy to know that I added explicit checks to
 shut
 gcc warnings about unchecked write() return value in the debug code.

 While it's very likely that almost nobody is affected by the bugs above,
 troubleshooting them is annoying enough to justify an upgrade.

 Sources, Linux/x86 and Solaris/sparc binaries are at the usual location :

site index : http://haproxy.1wt.eu/
sources: http://haproxy.1wt.eu/download/1.4/src/
changelog  : http://haproxy.1wt.eu/download/1.4/src/CHANGELOG
binaries   : http://haproxy.1wt.eu/download/1.4/bin/

 Willy






Re: mysql failover and forcing disconnects

2012-05-24 Thread Justin Karneges
On Thursday, May 24, 2012 01:59:32 AM Willy Tarreau wrote:
 On Thu, May 24, 2012 at 01:12:14AM -0700, Justin Karneges wrote:
  Well, the network could fail at anytime and have a similar effect. I'm
  not sure if killing all connections to the backup is really any worse
  than killing all connections to the non-backup (via on-marked-down).
  Either way a bunch of client errors may occur, but for a scenario that
  is hopefully rare.
 
 Killing connections when something fails is acceptable to many people,
 but killing connections when everything goes well is generally not
 accepted.

This is certainly a sensible philosophy.

I think what makes my scenario special is that the backup server is 
functionally worse than the non-backup. So even though we are discussing a 
destructive response to a positive event, it's the quickest way to get the 
service out of a degraded state.

  Maybe an on-marked-up shutdown-backup-sessions option would be good.
 
 I was thinking about something like this, but I still have doubts about
 its real usefulness. I don't know what others think here. If there is
 real demand for this and people think it serves a real purpose, I'm fine
 with accepting a patch to implement it.

Thanks for being open. I'll mull this over some more and consider making a 
patch.

Justin



Re: could haproxy call redis for a result?

2012-05-24 Thread S Ahmed
Baptiste,

Whenever this feature will be implemented, will it work for a specific url
like:

subdomain1.example.com

What about by query string?  like:

www.example.com/customer/12345

or

www.example.com/some/path?customerId=12345


Will it work for all the above?

On Tue, May 8, 2012 at 9:38 PM, S Ahmed sahmed1...@gmail.com wrote:

 Yes it is the lookup that I am worried about.


 On Tue, May 8, 2012 at 5:46 PM, Baptiste bed...@gmail.com wrote:

 Hi,

 Willy has just released 1.5-dev9, but unfortunately the track
 functions can't yet track strings (and so URLs).
 I'll let you know once a nightly snapshot could do it and we could
 work on a proof of concept configuration.

 Concerning 250K URLs, that should not be an issue at all to store them.
 Maybe looking for one URL could have a performance impact, we'll see.

 cheers

 On Tue, May 8, 2012 at 10:00 PM, S Ahmed sahmed1...@gmail.com wrote:
  Great.
 
  So any ideas how many urls one can story in these sticky tables before
 it
  becomes a problem?
 
  Would 250K be something of a concern?
 
 
  On Tue, May 8, 2012 at 11:26 AM, Baptiste bed...@gmail.com wrote:
 
  On Tue, May 8, 2012 at 3:25 PM, S Ahmed sahmed1...@gmail.com wrote:
   Ok that sounds awesome, how will that work though?  i.e. from say
 java,
   how
   will I do that?
  
   From what your saying it sounds like I will just have to modify the
   response
   add and a particular header.  And on the flip side, if I want to
 unblock
   I'll make a http request with something in the header that will
 unblock
   it?
  
 
  That's it.
  You'll have to track these headers with ACLs in HAProxy and to update
  the stick table accordingly.
  Then based on the value setup in the stick table, HAProxy can decide
  whether it will allow or reject the request.
 
   When do you think this will go live?
  
 
  In an other mail, Willy said he will release 1.5-dev9 today.
  So I guess it won't be too long now. Worste case would be later in the
  week or next week.
 
  cheers
 
 





RE: haproxy conditional healthchecks/failover

2012-05-24 Thread Zulu Chas

  Hi!
 
  I'm trying to use HAproxy to support the concepts of offline, in
  maintenance mode, and not working servers.
 
 Any good reason to do that???
 (I'm a bit curious)

Sure.  I want to be able to mark a machine offline by creating a file (as 
opposed to marking it online by creating a file), which is why I can't use 
disable-on-404 below.  This covers situations where I need to take a machine 
out of public-facing operation for some reason, but perhaps I still want it to 
be able to render pages etc -- maybe I'm testing a code deployment once it's 
already deployed in order to verify the system is ready to be marked online.
I also want to be able to mark a machine down for maintenance by creating a 
file, maintenance.html, which apache will nicely rewrite URLs to etc. during 
critical deployment phases or when performing other maintenance.  In this case, 
I don't want it to render pages (usually to replace otherwise nasty-looking 500 
error pages with a nice html facade).
For normal operations, I want the machine to be up.  But if it's not 
intentionally placed offline or in maintenance and the machines fail 
heartbeat checks, then the machine is not working and should not be served 
requests.
Does this make sense?
 
   I have separate health checks
  for each condition and I have been trying to use ACLs to be able to switch
  between backends.  In addition to the fact that this doesn't seem to work,
  I'm also not loving having to repeat the server lists (which are the same)
  for each backend.
 
 Nothing weird here, this is how HAProxy configuration works.
Cool, but variables would be nice to save time and avoid potential 
inconsistencies between sections.
  -- I think it's more like if any of
  these succeed, mark this server online -- and that's what's making this
  scenario complex.
 
 euh I might misunderstanding something.
 There is nothing more simple that if the health check is successful,
 then the server is considered healthy...

Since it's not strictly binary, as described above, it's a bit more complex.

  frontend staging 0.0.0.0:8080
# if the number of servers *not marked offline* is *less than the total
  number of app servers* (in this case, 2), then it is considered degraded
acl degraded nbsrv(only_online) lt 2
 
 
 This will match 0 and 1
 
# if the number of servers *not marked offline* is *less than one*, the
  site is considered down
acl down nbsrv(only_online) lt 1
 
 
 This will match 0, so you're both down and degraded ACL covers the
 same value (0).
 Which may lead to an issue later
 
# if the number of servers without the maintenance page is *less than the
  total number of app servers* (in this case, 2), then it is
  considered maintenance mode
acl mx_mode nbsrv(maintenance) lt 2
 
# if the number of servers without the maintenance page is less than 1,
  we're down because everything is in maintenance mode
acl down_mx nbsrv(maintenance) lt 1
 
 
 Same remark as above.
 
 
# if not running at full potential, use the backend that identified the
  degraded state
use_backend only_online if degraded
use_backend maintenance if mx_mode
 
# if we are down for any reason, use the backend that identified that fact
use_backend backup_only if down
use_backend backup_only if down_mx
 
 
 Here is the problem (see above).
 The 2 use_backend above will NEVER match, because the degraded ad
 mx_mode ACL overlaps their values!

Why would they never match?  Aren't you saying they *both* should match and 
wouldn't it then take action on the final match and switch the backend to 
maintenance mode?  That's what I want.  Maintenance mode overrides offline mode 
as a failsafe (since it's more restrictive) to prevent page rendering.
 Do you know the disable-on-404 option?
 it may help you make your configuration in the right way (not
 considering a 404 as a healthy response).
 

Yes, but what I actually would need is enable-on-404 :)
Thanks for your feedback!  I'm definitely open to other options, but I'm hoping 
to not have to lose the flexibility described above!
-chaz
  

Problems with layer7 check timeout

2012-05-24 Thread Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY]
Hi,
We're having odd behavior (apparently have always but didn't realize it), where 
our backend httpchks time out:

May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
servers left. 3 sessions active, 0 requeued, 0 remaining in queue.
May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
servers left. 0 sessions active, 0 requeued, 0 remaining in queue.


We've been playing with the timeout values, and we don't know what is 
controlling the Layer7 timeout, check duration: 1002ms.  The backend service 
availability check (by hand) typically takes 2-3 seconds on average.
Here is the relevant haproxy setup.

#-
# Global settings
#-
global
log-send-hostname opsslb1
log 127.0.0.1 local1 info
#chroot  /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 1024
userhaproxy
group   haproxy
daemon

#-
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#-
defaults
modehttp
log global
option  dontlognull
option  httpclose
option  httplog
option  forwardfor
option  redispatch
timeout connect 500 # default 10 second time out if a backend is not found
timeout client 5
timeout server 360
maxconn 6
retries 3

frontend webapp_ops_ft

bind 10.0.40.209:80
default_backend webapp_ops_bk

backend webapp_ops_bk
balance roundrobin
option httpchk HEAD /app/availability
reqrep ^Host:.* Host:\ webapp.example.com
server webapp_ops1 opsapp1.ops.example.com:41000 check inter 3
server webapp_ops2 opsapp2.ops.example.com:41000 check inter 3
server webapp_ops3 opsapp3.ops.example.com:41000 check inter 3
timeout check 15000
timeout connect 15000

Kevin Lange
kevin.m.la...@nasa.gov
kla...@raytheon.com
W: +1 (301) 851-8450
Raytheon  | NASA  | ECS Evolution Development Program
https://www.echo.com  | https://www.raytheon.com



smime.p7s
Description: S/MIME cryptographic signature


Re: Problems with layer7 check timeout

2012-05-24 Thread Willy Tarreau
Hi Kevin,

On Thu, May 24, 2012 at 04:04:03PM -0500, Lange, Kevin M. (GSFC-423.0)[RAYTHEON 
COMPANY] wrote:
 Hi,
 We're having odd behavior (apparently have always but didn't realize it), 
 where our backend httpchks time out:
 
 May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
 DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
 servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
 May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
 servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
 May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
 DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
 servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
 May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
 servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
 May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
 DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
 servers left. 3 sessions active, 0 requeued, 0 remaining in queue.
 May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
 servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
 May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
 servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
 May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
 DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
 servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
 May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
 servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
 
 
 We've been playing with the timeout values, and we don't know what is 
 controlling the Layer7 timeout, check duration: 1002ms.  The backend 
 service availability check (by hand) typically takes 2-3 seconds on average.
 Here is the relevant haproxy setup.
 
 #-
 # Global settings
 #-
 global
 log-send-hostname opsslb1
 log 127.0.0.1 local1 info
 #chroot  /var/lib/haproxy
 pidfile /var/run/haproxy.pid
 maxconn 1024
 userhaproxy
 group   haproxy
 daemon
 
 #-
 # common defaults that all the 'listen' and 'backend' sections will
 # use if not designated in their block
 #-
 defaults
 modehttp
 log global
 option  dontlognull
 option  httpclose
 option  httplog
 option  forwardfor
 option  redispatch
 timeout connect 500 # default 10 second time out if a backend is not found
 timeout client 5
 timeout server 360
 maxconn 6
 retries 3
 
 frontend webapp_ops_ft
 
 bind 10.0.40.209:80
 default_backend webapp_ops_bk
 
 backend webapp_ops_bk
 balance roundrobin
 option httpchk HEAD /app/availability
 reqrep ^Host:.* Host:\ webapp.example.com
 server webapp_ops1 opsapp1.ops.example.com:41000 check inter 3
 server webapp_ops2 opsapp2.ops.example.com:41000 check inter 3
 server webapp_ops3 opsapp3.ops.example.com:41000 check inter 3
 timeout check 15000
 timeout connect 15000

This is quite strange. The timeout is defined first by timeout check or if
unset, by inter. So in your case you should observe a 15sec timeout, not
one second.

What exact version is this ? (haproxy -vv)

It looks like a bug, however it could be a bug in the timeout handling as
well as in the reporting. I'd suspect the latter since you're saying that
the service takes 2-3 sec to respond and you don't seem to see errors
that often.

Regards,
Willy




Re: Problems with layer7 check timeout

2012-05-24 Thread Willy Tarreau
On Thu, May 24, 2012 at 04:31:39PM -0500, Lange, Kevin M. (GSFC-423.0)[RAYTHEON 
COMPANY] wrote:
 I thought it was a bug in the reporting, considering we've played with 
 numerous values for the various timeouts as an experiment, but wanted your 
 thoughts.
 This is v1.4.15.
 
  [root@opsslb1 log]# haproxy -v
 HA-Proxy version 1.4.15 2011/04/08
 Copyright 2000-2010 Willy Tarreau w...@1wt.eu

OK, I'll try to reproduce. There have been a number of fixes since 1.4.15
BTW, but none of them look like what you observe. Still it would be
reasonable to consider an upgrade to 1.4.21.

Regards,
Willy




Re: Problems with layer7 check timeout

2012-05-24 Thread Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY]
I've already put an upgrade to haproxy in place.


Kevin M Lange
Mission Operations and Services
NASA EOSDIS Evolution and Development
Intelligence and Information Systems
Raytheon Company

+1 (301) 851-8450 (office)
+1 (301) 807-2457 (cell)
kevin.m.la...@nasa.gov
kla...@raytheon.com

5700 Rivertech Court
Riverdale, Maryland 20737

- Reply message -
From: Willy Tarreau w...@1wt.eu
Date: Thu, May 24, 2012 5:59 pm
Subject: Problems with layer7 check timeout
To: Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY] kevin.m.la...@nasa.gov
Cc: haproxy@formilux.org haproxy@formilux.org

On Thu, May 24, 2012 at 04:31:39PM -0500, Lange, Kevin M. (GSFC-423.0)[RAYTHEON 
COMPANY] wrote:
 I thought it was a bug in the reporting, considering we've played with 
 numerous values for the various timeouts as an experiment, but wanted your 
 thoughts.
 This is v1.4.15.

  [root@opsslb1 log]# haproxy -v
 HA-Proxy version 1.4.15 2011/04/08
 Copyright 2000-2010 Willy Tarreau w...@1wt.eu

OK, I'll try to reproduce. There have been a number of fixes since 1.4.15
BTW, but none of them look like what you observe. Still it would be
reasonable to consider an upgrade to 1.4.21.

Regards,
Willy



patch: on-marked-up option

2012-05-24 Thread Justin Karneges
Hi,

This implements the feature discussed in the earlier thread of killing 
connections on backup servers when a non-backup server comes back up. For 
example, you can use this to route to a mysql master  slave and ensure 
clients don't stay on the slave after the master goes from down-up. I've done 
some minimal testing and it seems to work.

Today is the first time I ever looked at haproxy's code but this feature seemed 
straightforward enough to implement. I hope I've done it properly.

Justin
diff --git a/include/types/checks.h b/include/types/checks.h
index fd15c95..250a68f 100644
--- a/include/types/checks.h
+++ b/include/types/checks.h
@@ -80,6 +80,12 @@ enum {
 };
 
 enum {
+	HANA_ONMARKEDUP_NONE	= 0,
+
+	HANA_ONMARKEDUP_SHUTDOWNBACKUPSESSIONS,	/* Shutdown peer sessions */
+};
+
+enum {
 	HANA_OBS_NONE		= 0,
 
 	HANA_OBS_LAYER4,		/* Observe L4 - for example tcp */
diff --git a/include/types/server.h b/include/types/server.h
index aa2c4f8..1885eab 100644
--- a/include/types/server.h
+++ b/include/types/server.h
@@ -120,7 +120,8 @@ struct server {
 	int rise, fall;/* time in iterations */
 	int consecutive_errors_limit;		/* number of consecutive errors that triggers an event */
 	short observe, onerror;			/* observing mode: one of HANA_OBS_*; what to do on error: on of ANA_ONERR_* */
-	short onmarkeddown;			/* what to do when marked down: on of HANA_ONMARKEDDOWN_* */
+	short onmarkeddown;			/* what to do when marked down: one of HANA_ONMARKEDDOWN_* */
+	short onmarkedup;			/* what to do when marked up: one of HANA_ONMARKEDUP_* */
 	int inter, fastinter, downinter;	/* checks: time in milliseconds */
 	int slowstart;/* slowstart time in seconds (ms in the conf) */
 	int result;/* health-check result : SRV_CHK_* */
diff --git a/include/types/session.h b/include/types/session.h
index f1b7451..a098002 100644
--- a/include/types/session.h
+++ b/include/types/session.h
@@ -67,6 +67,7 @@
 #define SN_ERR_INTERNAL	0x7000	/* the proxy encountered an internal error */
 #define SN_ERR_DOWN	0x8000	/* the proxy killed a session because the backend became unavailable */
 #define SN_ERR_KILLED	0x9000	/* the proxy killed a session because it was asked to do so */
+#define SN_ERR_UP	0xa000	/* the proxy killed a session because a preferred backend became available */
 #define SN_ERR_MASK	0xf000	/* mask to get only session error flags */
 #define SN_ERR_SHIFT	12		/* bit shift */
 
diff --git a/src/cfgparse.c b/src/cfgparse.c
index 5bd2cfc..92dd094 100644
--- a/src/cfgparse.c
+++ b/src/cfgparse.c
@@ -4392,6 +4392,18 @@ stats_error_parsing:
 
 cur_arg += 2;
 			}
+			else if (!strcmp(args[cur_arg], on-marked-up)) {
+if (!strcmp(args[cur_arg + 1], shutdown-backup-sessions))
+	newsrv-onmarkedup = HANA_ONMARKEDUP_SHUTDOWNBACKUPSESSIONS;
+else {
+	Alert(parsing [%s:%d]: '%s' expects 'shutdown-backup-sessions' but got '%s'\n,
+		file, linenum, args[cur_arg], args[cur_arg + 1]);
+	err_code |= ERR_ALERT | ERR_FATAL;
+	goto out;
+}
+
+cur_arg += 2;
+			}
 			else if (!strcmp(args[cur_arg], error-limit)) {
 if (!*args[cur_arg + 1]) {
 	Alert(parsing [%s:%d]: '%s' expects an integer argument.\n,
diff --git a/src/checks.c b/src/checks.c
index febf77e..5299e98 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -358,15 +358,26 @@ static int check_for_pending(struct server *s)
 	return xferred;
 }
 
-/* Shutdown connections when their server goes down.
+/* Shutdown all connections of a server
  */
-static void shutdown_sessions(struct server *srv)
+static void shutdown_sessions(struct server *srv, int why)
 {
 	struct session *session, *session_bck;
 
 	list_for_each_entry_safe(session, session_bck, srv-actconns, by_srv)
 		if (session-srv_conn == srv)
-			session_shutdown(session, SN_ERR_DOWN);
+			session_shutdown(session, why);
+}
+
+/* Shutdown all connections of all backup servers of a proxy
+ */
+static void shutdown_backup_sessions(struct proxy *px, int why)
+{
+	struct server *srv;
+
+	for (srv = px-srv; srv != NULL; srv = srv-next)
+		if (srv-state  SRV_BACKUP)
+			shutdown_sessions(srv, why);
 }
 
 /* Sets server s down, notifies by all available means, recounts the
@@ -394,7 +405,7 @@ void set_server_down(struct server *s)
 			s-proxy-lbprm.set_server_status_down(s);
 
 		if (s-onmarkeddown  HANA_ONMARKEDDOWN_SHUTDOWNSESSIONS)
-			shutdown_sessions(s);
+			shutdown_sessions(s, SN_ERR_DOWN);
 
 		/* we might have sessions queued on this server and waiting for
 		 * a connection. Those which are redispatchable will be queued
@@ -465,6 +476,9 @@ void set_server_up(struct server *s) {
 		s-state |= SRV_RUNNING;
 		s-state = ~SRV_MAINTAIN;
 
+		if (s-onmarkedup  HANA_ONMARKEDUP_SHUTDOWNBACKUPSESSIONS)
+			shutdown_backup_sessions(s-proxy, SN_ERR_UP);
+
 		if (s-slowstart  0) {
 			s-state |= SRV_WARMINGUP;
 			if (s-proxy-lbprm.algo  BE_LB_PROP_DYN) {


Re: Problems with layer7 check timeout

2012-05-24 Thread Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY]
Monsieur Tarreau,

Actually, we are seeing frontend service availability flapping. This morning 
particularly.  Missing from my snippet is the logic for an unplanned outage 
landing page, that our customers were seeing this morning, so it haproxy truly 
is timing out and marking each backend as down until there are no backend 
servers available, throwing up the unplanned outage landing page.

I'll send more logs and details when I analyze later.

Regards,
Kevin Lange


Kevin M Lange
Mission Operations and Services
NASA EOSDIS Evolution and Development
Intelligence and Information Systems
Raytheon Company

+1 (301) 851-8450 (office)
+1 (301) 807-2457 (cell)
kevin.m.la...@nasa.gov
kla...@raytheon.com

5700 Rivertech Court
Riverdale, Maryland 20737

- Reply message -
From: Willy Tarreau w...@1wt.eu
Date: Thu, May 24, 2012 5:18 pm
Subject: Problems with layer7 check timeout
To: Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY] kevin.m.la...@nasa.gov
Cc: haproxy@formilux.org haproxy@formilux.org

Hi Kevin,

On Thu, May 24, 2012 at 04:04:03PM -0500, Lange, Kevin M. (GSFC-423.0)[RAYTHEON 
COMPANY] wrote:
 Hi,
 We're having odd behavior (apparently have always but didn't realize it), 
 where our backend httpchks time out:

 May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
 DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
 servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
 May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
 servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
 May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
 DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
 servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
 May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
 servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
 May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
 DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
 servers left. 3 sessions active, 0 requeued, 0 remaining in queue.
 May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
 servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
 May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
 servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
 May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
 DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
 servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
 May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
 servers left. 0 sessions active, 0 requeued, 0 remaining in queue.


 We've been playing with the timeout values, and we don't know what is 
 controlling the Layer7 timeout, check duration: 1002ms.  The backend 
 service availability check (by hand) typically takes 2-3 seconds on average.
 Here is the relevant haproxy setup.

 #-
 # Global settings
 #-
 global
 log-send-hostname opsslb1
 log 127.0.0.1 local1 info
 #chroot  /var/lib/haproxy
 pidfile /var/run/haproxy.pid
 maxconn 1024
 userhaproxy
 group   haproxy
 daemon

 #-
 # common defaults that all the 'listen' and 'backend' sections will
 # use if not designated in their block
 #-
 defaults
 modehttp
 log global
 option  dontlognull
 option  httpclose
 option  httplog
 option  forwardfor
 option  redispatch
 timeout connect 500 # default 10 second time out if a backend is not found
 timeout client 5
 timeout server 360
 maxconn 6
 retries 3

 frontend webapp_ops_ft

 bind 10.0.40.209:80
 default_backend webapp_ops_bk

 backend webapp_ops_bk
 balance roundrobin
 option httpchk HEAD /app/availability
 reqrep ^Host:.* Host:\ webapp.example.com
 server webapp_ops1 opsapp1.ops.example.com:41000 check inter 3
 server webapp_ops2 opsapp2.ops.example.com:41000 check inter 3
 server webapp_ops3 opsapp3.ops.example.com:41000 check inter 

Re: Problems with layer7 check timeout

2012-05-24 Thread Baptiste
Hi Lange,

Would it be possible to take a trace (tcpdump) of the health check?
This may help as well.

Cheers


On Fri, May 25, 2012 at 4:01 AM, Lange, Kevin M. (GSFC-423.0)[RAYTHEON
COMPANY] kevin.m.la...@nasa.gov wrote:
 Monsieur Tarreau,

 Actually, we are seeing frontend service availability flapping. This morning
 particularly.  Missing from my snippet is the logic for an unplanned outage
 landing page, that our customers were seeing this morning, so it haproxy
 truly is timing out and marking each backend as down until there are no
 backend servers available, throwing up the unplanned outage landing page.

 I'll send more logs and details when I analyze later.

 Regards,
 Kevin Lange


 
 Kevin M Lange
 Mission Operations and Services
 NASA EOSDIS Evolution and Development
 Intelligence and Information Systems
 Raytheon Company

 +1 (301) 851-8450 (office)
 +1 (301) 807-2457 (cell)
 kevin.m.la...@nasa.gov
 kla...@raytheon.com

 5700 Rivertech Court
 Riverdale, Maryland 20737

 - Reply message -
 From: Willy Tarreau w...@1wt.eu
 Date: Thu, May 24, 2012 5:18 pm
 Subject: Problems with layer7 check timeout
 To: Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY]
 kevin.m.la...@nasa.gov
 Cc: haproxy@formilux.org haproxy@formilux.org

 Hi Kevin,

 On Thu, May 24, 2012 at 04:04:03PM -0500, Lange, Kevin M.
 (GSFC-423.0)[RAYTHEON COMPANY] wrote:
 Hi,
 We're having odd behavior (apparently have always but didn't realize it),
 where our backend httpchks time out:

 May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is
 DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup
 servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
 May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup
 servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
 May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is
 DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup
 servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
 May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup
 servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
 May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is
 DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup
 servers left. 3 sessions active, 0 requeued, 0 remaining in queue.
 May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup
 servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
 May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup
 servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
 May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is
 DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup
 servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
 May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is
 DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup
 servers left. 0 sessions active, 0 requeued, 0 remaining in queue.


 We've been playing with the timeout values, and we don't know what is
 controlling the Layer7 timeout, check duration: 1002ms.  The backend
 service availability check (by hand) typically takes 2-3 seconds on average.
 Here is the relevant haproxy setup.

 #-
 # Global settings
 #-
 global
 log-send-hostname opsslb1
 log 127.0.0.1 local1 info
 #    chroot  /var/lib/haproxy
 pidfile /var/run/haproxy.pid
 maxconn 1024
 user    haproxy
 group   haproxy
 daemon

 #-
 # common defaults that all the 'listen' and 'backend' sections will
 # use if not designated in their block
 #-
 defaults
 mode    http
 log global
 option  dontlognull
 option  httpclose
 option  httplog
 option  forwardfor
 option  redispatch
 timeout connect 500 # default 10 second time out if a backend is not
 found
 timeout client 5
 timeout server 360
 maxconn 6
 retries 3

 frontend webapp_ops_ft

 bind 10.0.40.209:80
 default_backend webapp_ops_bk

 backend webapp_ops_bk
 balance roundrobin
 option httpchk HEAD /app/availability
 reqrep ^Host:.* Host:\ 

Re: could haproxy call redis for a result?

2012-05-24 Thread Baptiste
Hi,

I'm jut guessing, but to me it can work for URLs only, so in your
case, it will match /, /customer/12345, and
/some/path?customerId=12345.
For now, the string table can't have a concatenated string of 2
information, Host header and URL in your case.
But who knows, maybe this feature will arrive soon too :)

cheers



On Thu, May 24, 2012 at 6:33 PM, S Ahmed sahmed1...@gmail.com wrote:
 Baptiste,

 Whenever this feature will be implemented, will it work for a specific url
 like:

 subdomain1.example.com

 What about by query string?  like:

 www.example.com/customer/12345

 or

 www.example.com/some/path?customerId=12345


 Will it work for all the above?

 On Tue, May 8, 2012 at 9:38 PM, S Ahmed sahmed1...@gmail.com wrote:

 Yes it is the lookup that I am worried about.


 On Tue, May 8, 2012 at 5:46 PM, Baptiste bed...@gmail.com wrote:

 Hi,

 Willy has just released 1.5-dev9, but unfortunately the track
 functions can't yet track strings (and so URLs).
 I'll let you know once a nightly snapshot could do it and we could
 work on a proof of concept configuration.

 Concerning 250K URLs, that should not be an issue at all to store them.
 Maybe looking for one URL could have a performance impact, we'll see.

 cheers

 On Tue, May 8, 2012 at 10:00 PM, S Ahmed sahmed1...@gmail.com wrote:
  Great.
 
  So any ideas how many urls one can story in these sticky tables before
  it
  becomes a problem?
 
  Would 250K be something of a concern?
 
 
  On Tue, May 8, 2012 at 11:26 AM, Baptiste bed...@gmail.com wrote:
 
  On Tue, May 8, 2012 at 3:25 PM, S Ahmed sahmed1...@gmail.com wrote:
   Ok that sounds awesome, how will that work though?  i.e. from say
   java,
   how
   will I do that?
  
   From what your saying it sounds like I will just have to modify the
   response
   add and a particular header.  And on the flip side, if I want to
   unblock
   I'll make a http request with something in the header that will
   unblock
   it?
  
 
  That's it.
  You'll have to track these headers with ACLs in HAProxy and to update
  the stick table accordingly.
  Then based on the value setup in the stick table, HAProxy can decide
  whether it will allow or reject the request.
 
   When do you think this will go live?
  
 
  In an other mail, Willy said he will release 1.5-dev9 today.
  So I guess it won't be too long now. Worste case would be later in the
  week or next week.
 
  cheers
 
 






CISCO Workshop on: Strategic Management of Technology and Resources to Increase Attorney Productivity

2012-05-24 Thread Ellena Wright
Dear Reader,

Invitation to attend GOAL’s next workshop led by - Tanya Vaislev,
Senior Manager, Legal Cisco Systems, Inc., USA

Topic: Strategic Management of Technology and Resources to Increase
Attorney Productivity

Venue: At your desk: On your Laptop/PC or Phone
Date: 29 May 2012
Time: 9:00 am PDT/11:00 am CDT/12:00 pm EDT
Duration: 60 minutes including Q  A
Live Participation  Podcast: US$ 49 (Free for Gold Members and General
Counsels)
Access the Recorded Version (Podcast) only: US$ 49

The purchase price of this event will also give you FREE access to one
additional LPO/IP Offshoring Podcasts from our library (same or less
price!). Over and above, upon your registration you will be awarded
with the Silver Membership at GOAL, absolutely free of cost.

For registration, please contact: +1-562-366-4706 or reply to this
email.


Best regards,
Ellena Wright
Executive, Global Outsourcing Association of Lawyers (GOAL)

PS: To access the library of Podcasts of our previous Legal/IP
Offshoring webinars, please indicate. We can send you the requisite
information.

If you don’t want to receive emails, please reply to this email with
the subject line ‘unsubscribe’.



Re: Problems with layer7 check timeout

2012-05-24 Thread Willy Tarreau
Hi Kevin,

On Thu, May 24, 2012 at 09:01:43PM -0500, Lange, Kevin M. (GSFC-423.0)[RAYTHEON 
COMPANY] wrote:
 Monsieur Tarreau,
 
 Actually, we are seeing frontend service availability flapping. This morning
 particularly.  Missing from my snippet is the logic for an unplanned outage
 landing page, that our customers were seeing this morning, so it haproxy
 truly is timing out and marking each backend as down until there are no
 backend servers available, throwing up the unplanned outage landing page.

I'm not surprized, if you observe that checks should last more than 1 second
to work correctly :-/

I have tested your configuration here. First I can say it's not bad reporting,
as the timers in the logs are the correct ones. Second, I noticed a bug, when
both timeout check and timeout connect are set, haproxy uses the largest
of the two for the check ! The reason is that it did not displace the task in
the queue upon connection establishment if the check timeout is smaller than
the connect timeout. I've fixed it now.

This aside, I could not reproduce the issue here. I agree with Baptiste, a
tcpdump would certainly help. Maybe we're facing corner case issues which
I did not test, such as servers sending partial responses or things like
this, I don't know.

Regards,
Willy