Re: maintenance mode and server affinity

2011-08-02 Thread Willy Tarreau
Hi James,

On Mon, Aug 01, 2011 at 04:05:41PM -0400, James Bardin wrote:
 I have a number if instances using tcp mode, and a stick-table on src
 ip for affinity. When a server is in maintenance mode, clients with an
 existing affinity will still connect to the disabled server, and only
 be re-dispatched if the connection fails (and error responses from the
 backend are still successful tcp connections).

Are you sure your server was set in maintenance mode, did you not just
set its weight to zero ?

There is a big difference between zero weight and maintenance mode :
  - zero weight means the server is not selected in load balancing,
which means it will not get any new visitor, but will still get
existing visitors ;

  - maintenance means the server is offline and must not receive
any traffic at all, except the admin's tests selected with
force-persist rules.

So if this is not what you're observing, then it's a bug and we need
to see how to reproduce it in order to fix it.

 I've done a few things to stop this traffic when needed:
  - drop the packets on the load balancer with a null route or iptables.
  - block the packets with the firewall on the backend server, and let
 the clients get re-dispatched.
  - shutdown the services that could response from the backend, and 
 re-dispatch.
 
 
 Have I missed any configuration in haproxy that will completely stop
 traffic to a backend? I have no problem managing this as-is myself,
 but having fewer pieces involved makes delegating administration
 responsibilities easier.

I agree with you. The maintenance mode was done exactly for what you
need so I want to ensure it works.

 Willy, is a block server option (or maybe a drop table to get rid
 of affinity sessions), something that could be implemented?

I think the later can be done on the stats socket using clear table,
because you can specify a rule to select which entries to clear, so you
can clear any entry matching your server's ID. But it's only in 1.5, not
in a stable release.

Regards,
Willy




Re: 5000 CPS for haproxy

2011-08-02 Thread Willy Tarreau
Hello,

On Mon, Aug 01, 2011 at 07:00:37PM +0530, appasaheb bagali wrote:
 hello,
 
 we have deployed the Haproxy on amazon cloud.
 
 its working fine we would like to do testing  5000 CPS .
 Please suggest the way to test

There are various tools for that. The principle is that you should
start some dummy servers on other instances (or at least fast static
servers such as nginx), and run injection tools on other instances.
Such tools might be httperf, ab, inject or any such thing. You will
then configure your haproxy to forward to the dummy servers and will
send your injectors' requests to haproxy. The tools will tell you
the data rate, connection rate, etc... You're encouraged to enable
the stats page on haproxy so that you can check rates and errors in
live.

In general, for 5k CPS, you need a bit of system tuning, because most
Linux distros come with a conntrack setting which is only valid for a
desktop usage but not for a server usage, so the traffic will suddenly
stop after a few seconds. Or better, simply disable the module.

Also, it is important that you have at least two machines for the
servers and at least two for the clients, because in such environments,
you have no visibility on anything, and it's quite common that some VMs
are struggling or that some network paths are saturated. If you see that
two servers behave differently, at least it's easier to spot where the
problem is.

Regards,
Willy




Re: 5000 CPS for haproxy

2011-08-02 Thread carlo flores
To add to this is a great automated tool and ideas from The Chicago Tribune
called Bees With Machine Guns, which spins up n AWS micro instances to push
traffic to the target server.

https://github.com/newsapps/beeswithmachineguns

My CTO makes the argument that connections/s or sessions/s don't mean much
unless those sessions are testing realistic user traffic (which tests the
application/database/etc). This is not the methodology you're using to test
HAProxy, of course, but it is something I think about enough that I feel
obligated to type about it.  If you care, we do this with Ruby's Net:HTTP
libraries making specific calls on existing sessions to our RESTful servers,
and those calls are built on random but real user data.


On Monday, August 1, 2011, Willy Tarreau w...@1wt.eu wrote:
 Hello,

 On Mon, Aug 01, 2011 at 07:00:37PM +0530, appasaheb bagali wrote:
 hello,

 we have deployed the Haproxy on amazon cloud.

 its working fine we would like to do testing  5000 CPS .
 Please suggest the way to test

 There are various tools for that. The principle is that you should
 start some dummy servers on other instances (or at least fast static
 servers such as nginx), and run injection tools on other instances.
 Such tools might be httperf, ab, inject or any such thing. You will
 then configure your haproxy to forward to the dummy servers and will
 send your injectors' requests to haproxy. The tools will tell you
 the data rate, connection rate, etc... You're encouraged to enable
 the stats page on haproxy so that you can check rates and errors in
 live.

 In general, for 5k CPS, you need a bit of system tuning, because most
 Linux distros come with a conntrack setting which is only valid for a
 desktop usage but not for a server usage, so the traffic will suddenly
 stop after a few seconds. Or better, simply disable the module.

 Also, it is important that you have at least two machines for the
 servers and at least two for the clients, because in such environments,
 you have no visibility on anything, and it's quite common that some VMs
 are struggling or that some network paths are saturated. If you see that
 two servers behave differently, at least it's easier to spot where the
 problem is.

 Regards,
 Willy





Re: 5000 CPS for haproxy

2011-08-02 Thread Baptiste
Hi Carlo,

Before testing the application itself, you must first test the infrastructure ;)
Once you know how much your infrastructure can deliver, then your
bench makes sense.
This is a step by step method, from the lower layer to the higher one.

Before testing your application in a virtualized environment, you
should bench it on physical servers.
Because on a virtualized environment, you're sharing resources with
anybody and the behavior may be odd under heavy load.

By the way, do you have a few ruby examples, I'm interested by your
way of testing applications.
Long time ago, I used perl and libwww.

cheers :)


On Tue, Aug 2, 2011 at 9:08 AM, carlo flores ca...@petalphile.com wrote:
 To add to this is a great automated tool and ideas from The Chicago Tribune
 called Bees With Machine Guns, which spins up n AWS micro instances to push
 traffic to the target server.

 https://github.com/newsapps/beeswithmachineguns

 My CTO makes the argument that connections/s or sessions/s don't mean much
 unless those sessions are testing realistic user traffic (which tests the
 application/database/etc). This is not the methodology you're using to test
 HAProxy, of course, but it is something I think about enough that I feel
 obligated to type about it.  If you care, we do this with Ruby's Net:HTTP
 libraries making specific calls on existing sessions to our RESTful servers,
 and those calls are built on random but real user data.


 On Monday, August 1, 2011, Willy Tarreau w...@1wt.eu wrote:
 Hello,

 On Mon, Aug 01, 2011 at 07:00:37PM +0530, appasaheb bagali wrote:
 hello,

 we have deployed the Haproxy on amazon cloud.

 its working fine we would like to do testing  5000 CPS .
 Please suggest the way to test

 There are various tools for that. The principle is that you should
 start some dummy servers on other instances (or at least fast static
 servers such as nginx), and run injection tools on other instances.
 Such tools might be httperf, ab, inject or any such thing. You will
 then configure your haproxy to forward to the dummy servers and will
 send your injectors' requests to haproxy. The tools will tell you
 the data rate, connection rate, etc... You're encouraged to enable
 the stats page on haproxy so that you can check rates and errors in
 live.

 In general, for 5k CPS, you need a bit of system tuning, because most
 Linux distros come with a conntrack setting which is only valid for a
 desktop usage but not for a server usage, so the traffic will suddenly
 stop after a few seconds. Or better, simply disable the module.

 Also, it is important that you have at least two machines for the
 servers and at least two for the clients, because in such environments,
 you have no visibility on anything, and it's quite common that some VMs
 are struggling or that some network paths are saturated. If you see that
 two servers behave differently, at least it's easier to spot where the
 problem is.

 Regards,
 Willy






Re: 5000 CPS for haproxy

2011-08-02 Thread carlo flores
This is true; however, in application your first concern with the
infrastructure is the first bottleneck, and (frankly) in many archotectures
it's probably not (properly tuned) HAProxy. That's all I'm saying and again
I understand why that's not relevant. I hope others on this list understand
why I mention this when folks talk of benchmarks.

We are not yet ready to release our benchmark examples. This is Ops' (my)
fault. We will, however, release them under our public repositories at
https://github.com/borderstylo.

On Tuesday, August 2, 2011, Baptiste bed...@gmail.com wrote:
 Hi Carlo,

 Before testing the application itself, you must first test the
infrastructure ;)
 Once you know how much your infrastructure can deliver, then your
 bench makes sense.
 This is a step by step method, from the lower layer to the higher one.

 Before testing your application in a virtualized environment, you
 should bench it on physical servers.
 Because on a virtualized environment, you're sharing resources with
 anybody and the behavior may be odd under heavy load.

 By the way, do you have a few ruby examples, I'm interested by your
 way of testing applications.
 Long time ago, I used perl and libwww.

 cheers :)


 On Tue, Aug 2, 2011 at 9:08 AM, carlo flores ca...@petalphile.com wrote:
 To add to this is a great automated tool and ideas from The Chicago
Tribune
 called Bees With Machine Guns, which spins up n AWS micro instances to
push
 traffic to the target server.

 https://github.com/newsapps/beeswithmachineguns

 My CTO makes the argument that connections/s or sessions/s don't mean
much
 unless those sessions are testing realistic user traffic (which tests the
 application/database/etc). This is not the methodology you're using to
test
 HAProxy, of course, but it is something I think about enough that I feel
 obligated to type about it.  If you care, we do this with Ruby's Net:HTTP
 libraries making specific calls on existing sessions to our RESTful
servers,
 and those calls are built on random but real user data.


 On Monday, August 1, 2011, Willy Tarreau w...@1wt.eu wrote:
 Hello,

 On Mon, Aug 01, 2011 at 07:00:37PM +0530, appasaheb bagali wrote:
 hello,

 we have deployed the Haproxy on amazon cloud.

 its working fine we would like to do testing  5000 CPS .
 Please suggest the way to test

 There are various tools for that. The principle is that you should
 start some dummy servers on other instances (or at least fast static
 servers such as nginx), and run injection tools on other instances.
 Such tools might be httperf, ab, inject or any such thing. You will
 then configure your haproxy to forward to the dummy servers and will
 send your injectors' requests to haproxy. The tools will tell you
 the data rate, connection rate, etc... You're encouraged to enable
 the stats page on haproxy so that you can check rates and errors in
 live.

 In general, for 5k CPS, you need a bit of system tuning, because most
 Linux distros come with a conntrack setting which is only valid for a
 desktop usage but not for a server usage, so the traffic will suddenly
 stop after a few seconds. Or better, simply disable the module.

 Also, it is important that you have at least two machines for the
 servers and at least two for the clients, because in such environments,
 you have no visibility on anything, and it's quite common that some VMs
 are struggling or that some network paths are saturated. If you see that
 two servers behave differently, at least it's easier to spot where the
 problem is.

 Regards,
 Willy






Re: maintenance mode and server affinity

2011-08-02 Thread James Bardin
On Tue, Aug 2, 2011 at 2:52 AM, Willy Tarreau w...@1wt.eu wrote:


 Are you sure your server was set in maintenance mode, did you not just
 set its weight to zero ?


Yes. I've confirmed that when using a stick-table for persistence,
putting a server in maintenance mode does not block traffic from
existing sessions.

I'm using the latest stable 1.4.15, built on centos5.



 I think the later can be done on the stats socket using clear table,
 because you can specify a rule to select which entries to clear, so you
 can clear any entry matching your server's ID. But it's only in 1.5, not
 in a stable release.

I saw the clear table in the dev version after I sent this. Since it
seems that I'm experiencing a bug in maintenance mode, the proper
behavior combined with clear table would be everything I need.


If you need any more info to help troubleshoot this, let me know.

-jim



Re: maintenance mode and server affinity

2011-08-02 Thread Willy Tarreau
On Tue, Aug 02, 2011 at 09:00:08AM -0400, James Bardin wrote:
 On Tue, Aug 2, 2011 at 2:52 AM, Willy Tarreau w...@1wt.eu wrote:
 
 
  Are you sure your server was set in maintenance mode, did you not just
  set its weight to zero ?
 
 
 Yes. I've confirmed that when using a stick-table for persistence,
 putting a server in maintenance mode does not block traffic from
 existing sessions.
 
 I'm using the latest stable 1.4.15, built on centos5.

OK thanks for confirming. Could you check if you have option persist
somewhere in your config ? From what I can tell from the code, this is
the only reason why a server set in maintenance mode would be selected :

if ((srv-state  SRV_RUNNING) ||
(px-options  PR_O_PERSIST) ||
(s-flags  SN_FORCE_PRST)) {
s-flags |= SN_DIRECT | SN_ASSIGNED;
set_target_server(s-target, srv);
}

- the server does not have the SRV_RUNNING flag in maintenance mode
- the persist option on the backend might be one reason
- I'm assuming there is no force-persist rule

If you have option persist, you should definitely remove it, as it's
done exactly for the behaviour you're experiencing : force a persistent
connection to go to a server even if it's marked as dead, and only
redispatch in case of connection error.

  I think the later can be done on the stats socket using clear table,
  because you can specify a rule to select which entries to clear, so you
  can clear any entry matching your server's ID. But it's only in 1.5, not
  in a stable release.
 
 I saw the clear table in the dev version after I sent this. Since it
 seems that I'm experiencing a bug in maintenance mode, the proper
 behavior combined with clear table would be everything I need.
 
 If you need any more info to help troubleshoot this, let me know.

If you don't have option persist, please post your config (or send it
privately if you prefer). Anyway, *please* remove any possible password
or sensible information from the config if you send it.

Willy




Re: maintenance mode and server affinity

2011-08-02 Thread James Bardin
On Tue, Aug 2, 2011 at 2:44 PM, Willy Tarreau w...@1wt.eu wrote:


 OK thanks for confirming. Could you check if you have option persist
 somewhere in your config ? From what I can tell from the code, this is
 the only reason why a server set in maintenance mode would be selected :

        if ((srv-state  SRV_RUNNING) ||
            (px-options  PR_O_PERSIST) ||
            (s-flags  SN_FORCE_PRST)) {
                s-flags |= SN_DIRECT | SN_ASSIGNED;
                set_target_server(s-target, srv);
        }

 - the server does not have the SRV_RUNNING flag in maintenance mode
 - the persist option on the backend might be one reason
 - I'm assuming there is no force-persist rule


OK, that's it.

I didn't realize that was the same code path for manually disabled
servers. I had option persist in there to prevent a server that misses
a few healthchecks under load from dumping all it's clients. Graceful
maintenance is more important than this edge case though, so I'll
remove it.


Thanks!
-jim



Re: maintenance mode and server affinity

2011-08-02 Thread Willy Tarreau
On Tue, Aug 02, 2011 at 03:08:32PM -0400, James Bardin wrote:
  - the server does not have the SRV_RUNNING flag in maintenance mode
  - the persist option on the backend might be one reason
  - I'm assuming there is no force-persist rule
 
 
 OK, that's it.

Fine!

 I didn't realize that was the same code path for manually disabled
 servers. I had option persist in there to prevent a server that misses
 a few healthchecks under load from dumping all it's clients. Graceful
 maintenance is more important than this edge case though, so I'll
 remove it.

If you want a server that misses a few health check to remain up, then
simply increase its fall parameter :-)

Thanks for the quick reply !
Willy




HAProxy backup servers still receive requests?

2011-08-02 Thread Gi Dot
Hi,

Referring to this thread 
http://comments.gmane.org/gmane.comp.web.haproxy/6037, I have modified my
haproxy configuration by setting up a couple of my jails as a backup server.
I converted them as backups due to hundreds of errors reported in the
haproxy stats page. But I wonder why I can still see a number of errors
reported, even after configuring them as backup? I thought no request should
go to backup servers.

Here's a screenshot of my haproxy stats page. This was taken last night, and
the numbers have slightly increased today.

http://people.freebsd.org/~miwi/ha.png

Thanks.


Re: HAProxy backup servers still receive requests?

2011-08-02 Thread Willy Tarreau
On Wed, Aug 03, 2011 at 01:14:52PM +0800, Gi Dot wrote:
 Hi,
 
 Referring to this thread 
 http://comments.gmane.org/gmane.comp.web.haproxy/6037, I have modified my
 haproxy configuration by setting up a couple of my jails as a backup server.
 I converted them as backups due to hundreds of errors reported in the
 haproxy stats page. But I wonder why I can still see a number of errors
 reported, even after configuring them as backup? I thought no request should
 go to backup servers.

A backup server can take requests which contain a reference to their cookie
if no other server is available for this cookie. It is possible that this
is what's happening. Maybe you changed their state and restarted haproxy,
so the users which were processed by these servers finish their work there
and no new user is assigned to them, which seems to be true given the ratio
between their usage and the other ones.

 Here's a screenshot of my haproxy stats page. This was taken last night, and
 the numbers have slightly increased today.
 
 http://people.freebsd.org/~miwi/ha.png

If the numbers still increase, it means that some users have kept their browser
open and continue to address the same server. Against this, I invite you to take
a look at the maxlife cookie option which says that a too old cookie will be
ignored and the user will be redispatched anyway. Similarly there's the maxidle
parameter which does the same but only during idle periods. The values are very
dependant on the site, but some might want to have maxidle 1h maxlife 1d so
that cookies from browsers that remained idle more than 1 hour are ignored, and
cookies assigned more than 1 day ago are ignored.

If your issue with those servers is temporary, you can also force them into
maintenance from the stats socket, or you can use the disabled setting on
the server line, which really disables them, as opposed to the backup mode
where they're supposed to still take some specific traffic.

Regards,
Willy