Re: maintenance mode and server affinity
Hi James, On Mon, Aug 01, 2011 at 04:05:41PM -0400, James Bardin wrote: I have a number if instances using tcp mode, and a stick-table on src ip for affinity. When a server is in maintenance mode, clients with an existing affinity will still connect to the disabled server, and only be re-dispatched if the connection fails (and error responses from the backend are still successful tcp connections). Are you sure your server was set in maintenance mode, did you not just set its weight to zero ? There is a big difference between zero weight and maintenance mode : - zero weight means the server is not selected in load balancing, which means it will not get any new visitor, but will still get existing visitors ; - maintenance means the server is offline and must not receive any traffic at all, except the admin's tests selected with force-persist rules. So if this is not what you're observing, then it's a bug and we need to see how to reproduce it in order to fix it. I've done a few things to stop this traffic when needed: - drop the packets on the load balancer with a null route or iptables. - block the packets with the firewall on the backend server, and let the clients get re-dispatched. - shutdown the services that could response from the backend, and re-dispatch. Have I missed any configuration in haproxy that will completely stop traffic to a backend? I have no problem managing this as-is myself, but having fewer pieces involved makes delegating administration responsibilities easier. I agree with you. The maintenance mode was done exactly for what you need so I want to ensure it works. Willy, is a block server option (or maybe a drop table to get rid of affinity sessions), something that could be implemented? I think the later can be done on the stats socket using clear table, because you can specify a rule to select which entries to clear, so you can clear any entry matching your server's ID. But it's only in 1.5, not in a stable release. Regards, Willy
Re: 5000 CPS for haproxy
Hello, On Mon, Aug 01, 2011 at 07:00:37PM +0530, appasaheb bagali wrote: hello, we have deployed the Haproxy on amazon cloud. its working fine we would like to do testing 5000 CPS . Please suggest the way to test There are various tools for that. The principle is that you should start some dummy servers on other instances (or at least fast static servers such as nginx), and run injection tools on other instances. Such tools might be httperf, ab, inject or any such thing. You will then configure your haproxy to forward to the dummy servers and will send your injectors' requests to haproxy. The tools will tell you the data rate, connection rate, etc... You're encouraged to enable the stats page on haproxy so that you can check rates and errors in live. In general, for 5k CPS, you need a bit of system tuning, because most Linux distros come with a conntrack setting which is only valid for a desktop usage but not for a server usage, so the traffic will suddenly stop after a few seconds. Or better, simply disable the module. Also, it is important that you have at least two machines for the servers and at least two for the clients, because in such environments, you have no visibility on anything, and it's quite common that some VMs are struggling or that some network paths are saturated. If you see that two servers behave differently, at least it's easier to spot where the problem is. Regards, Willy
Re: 5000 CPS for haproxy
To add to this is a great automated tool and ideas from The Chicago Tribune called Bees With Machine Guns, which spins up n AWS micro instances to push traffic to the target server. https://github.com/newsapps/beeswithmachineguns My CTO makes the argument that connections/s or sessions/s don't mean much unless those sessions are testing realistic user traffic (which tests the application/database/etc). This is not the methodology you're using to test HAProxy, of course, but it is something I think about enough that I feel obligated to type about it. If you care, we do this with Ruby's Net:HTTP libraries making specific calls on existing sessions to our RESTful servers, and those calls are built on random but real user data. On Monday, August 1, 2011, Willy Tarreau w...@1wt.eu wrote: Hello, On Mon, Aug 01, 2011 at 07:00:37PM +0530, appasaheb bagali wrote: hello, we have deployed the Haproxy on amazon cloud. its working fine we would like to do testing 5000 CPS . Please suggest the way to test There are various tools for that. The principle is that you should start some dummy servers on other instances (or at least fast static servers such as nginx), and run injection tools on other instances. Such tools might be httperf, ab, inject or any such thing. You will then configure your haproxy to forward to the dummy servers and will send your injectors' requests to haproxy. The tools will tell you the data rate, connection rate, etc... You're encouraged to enable the stats page on haproxy so that you can check rates and errors in live. In general, for 5k CPS, you need a bit of system tuning, because most Linux distros come with a conntrack setting which is only valid for a desktop usage but not for a server usage, so the traffic will suddenly stop after a few seconds. Or better, simply disable the module. Also, it is important that you have at least two machines for the servers and at least two for the clients, because in such environments, you have no visibility on anything, and it's quite common that some VMs are struggling or that some network paths are saturated. If you see that two servers behave differently, at least it's easier to spot where the problem is. Regards, Willy
Re: 5000 CPS for haproxy
Hi Carlo, Before testing the application itself, you must first test the infrastructure ;) Once you know how much your infrastructure can deliver, then your bench makes sense. This is a step by step method, from the lower layer to the higher one. Before testing your application in a virtualized environment, you should bench it on physical servers. Because on a virtualized environment, you're sharing resources with anybody and the behavior may be odd under heavy load. By the way, do you have a few ruby examples, I'm interested by your way of testing applications. Long time ago, I used perl and libwww. cheers :) On Tue, Aug 2, 2011 at 9:08 AM, carlo flores ca...@petalphile.com wrote: To add to this is a great automated tool and ideas from The Chicago Tribune called Bees With Machine Guns, which spins up n AWS micro instances to push traffic to the target server. https://github.com/newsapps/beeswithmachineguns My CTO makes the argument that connections/s or sessions/s don't mean much unless those sessions are testing realistic user traffic (which tests the application/database/etc). This is not the methodology you're using to test HAProxy, of course, but it is something I think about enough that I feel obligated to type about it. If you care, we do this with Ruby's Net:HTTP libraries making specific calls on existing sessions to our RESTful servers, and those calls are built on random but real user data. On Monday, August 1, 2011, Willy Tarreau w...@1wt.eu wrote: Hello, On Mon, Aug 01, 2011 at 07:00:37PM +0530, appasaheb bagali wrote: hello, we have deployed the Haproxy on amazon cloud. its working fine we would like to do testing 5000 CPS . Please suggest the way to test There are various tools for that. The principle is that you should start some dummy servers on other instances (or at least fast static servers such as nginx), and run injection tools on other instances. Such tools might be httperf, ab, inject or any such thing. You will then configure your haproxy to forward to the dummy servers and will send your injectors' requests to haproxy. The tools will tell you the data rate, connection rate, etc... You're encouraged to enable the stats page on haproxy so that you can check rates and errors in live. In general, for 5k CPS, you need a bit of system tuning, because most Linux distros come with a conntrack setting which is only valid for a desktop usage but not for a server usage, so the traffic will suddenly stop after a few seconds. Or better, simply disable the module. Also, it is important that you have at least two machines for the servers and at least two for the clients, because in such environments, you have no visibility on anything, and it's quite common that some VMs are struggling or that some network paths are saturated. If you see that two servers behave differently, at least it's easier to spot where the problem is. Regards, Willy
Re: 5000 CPS for haproxy
This is true; however, in application your first concern with the infrastructure is the first bottleneck, and (frankly) in many archotectures it's probably not (properly tuned) HAProxy. That's all I'm saying and again I understand why that's not relevant. I hope others on this list understand why I mention this when folks talk of benchmarks. We are not yet ready to release our benchmark examples. This is Ops' (my) fault. We will, however, release them under our public repositories at https://github.com/borderstylo. On Tuesday, August 2, 2011, Baptiste bed...@gmail.com wrote: Hi Carlo, Before testing the application itself, you must first test the infrastructure ;) Once you know how much your infrastructure can deliver, then your bench makes sense. This is a step by step method, from the lower layer to the higher one. Before testing your application in a virtualized environment, you should bench it on physical servers. Because on a virtualized environment, you're sharing resources with anybody and the behavior may be odd under heavy load. By the way, do you have a few ruby examples, I'm interested by your way of testing applications. Long time ago, I used perl and libwww. cheers :) On Tue, Aug 2, 2011 at 9:08 AM, carlo flores ca...@petalphile.com wrote: To add to this is a great automated tool and ideas from The Chicago Tribune called Bees With Machine Guns, which spins up n AWS micro instances to push traffic to the target server. https://github.com/newsapps/beeswithmachineguns My CTO makes the argument that connections/s or sessions/s don't mean much unless those sessions are testing realistic user traffic (which tests the application/database/etc). This is not the methodology you're using to test HAProxy, of course, but it is something I think about enough that I feel obligated to type about it. If you care, we do this with Ruby's Net:HTTP libraries making specific calls on existing sessions to our RESTful servers, and those calls are built on random but real user data. On Monday, August 1, 2011, Willy Tarreau w...@1wt.eu wrote: Hello, On Mon, Aug 01, 2011 at 07:00:37PM +0530, appasaheb bagali wrote: hello, we have deployed the Haproxy on amazon cloud. its working fine we would like to do testing 5000 CPS . Please suggest the way to test There are various tools for that. The principle is that you should start some dummy servers on other instances (or at least fast static servers such as nginx), and run injection tools on other instances. Such tools might be httperf, ab, inject or any such thing. You will then configure your haproxy to forward to the dummy servers and will send your injectors' requests to haproxy. The tools will tell you the data rate, connection rate, etc... You're encouraged to enable the stats page on haproxy so that you can check rates and errors in live. In general, for 5k CPS, you need a bit of system tuning, because most Linux distros come with a conntrack setting which is only valid for a desktop usage but not for a server usage, so the traffic will suddenly stop after a few seconds. Or better, simply disable the module. Also, it is important that you have at least two machines for the servers and at least two for the clients, because in such environments, you have no visibility on anything, and it's quite common that some VMs are struggling or that some network paths are saturated. If you see that two servers behave differently, at least it's easier to spot where the problem is. Regards, Willy
Re: maintenance mode and server affinity
On Tue, Aug 2, 2011 at 2:52 AM, Willy Tarreau w...@1wt.eu wrote: Are you sure your server was set in maintenance mode, did you not just set its weight to zero ? Yes. I've confirmed that when using a stick-table for persistence, putting a server in maintenance mode does not block traffic from existing sessions. I'm using the latest stable 1.4.15, built on centos5. I think the later can be done on the stats socket using clear table, because you can specify a rule to select which entries to clear, so you can clear any entry matching your server's ID. But it's only in 1.5, not in a stable release. I saw the clear table in the dev version after I sent this. Since it seems that I'm experiencing a bug in maintenance mode, the proper behavior combined with clear table would be everything I need. If you need any more info to help troubleshoot this, let me know. -jim
Re: maintenance mode and server affinity
On Tue, Aug 02, 2011 at 09:00:08AM -0400, James Bardin wrote: On Tue, Aug 2, 2011 at 2:52 AM, Willy Tarreau w...@1wt.eu wrote: Are you sure your server was set in maintenance mode, did you not just set its weight to zero ? Yes. I've confirmed that when using a stick-table for persistence, putting a server in maintenance mode does not block traffic from existing sessions. I'm using the latest stable 1.4.15, built on centos5. OK thanks for confirming. Could you check if you have option persist somewhere in your config ? From what I can tell from the code, this is the only reason why a server set in maintenance mode would be selected : if ((srv-state SRV_RUNNING) || (px-options PR_O_PERSIST) || (s-flags SN_FORCE_PRST)) { s-flags |= SN_DIRECT | SN_ASSIGNED; set_target_server(s-target, srv); } - the server does not have the SRV_RUNNING flag in maintenance mode - the persist option on the backend might be one reason - I'm assuming there is no force-persist rule If you have option persist, you should definitely remove it, as it's done exactly for the behaviour you're experiencing : force a persistent connection to go to a server even if it's marked as dead, and only redispatch in case of connection error. I think the later can be done on the stats socket using clear table, because you can specify a rule to select which entries to clear, so you can clear any entry matching your server's ID. But it's only in 1.5, not in a stable release. I saw the clear table in the dev version after I sent this. Since it seems that I'm experiencing a bug in maintenance mode, the proper behavior combined with clear table would be everything I need. If you need any more info to help troubleshoot this, let me know. If you don't have option persist, please post your config (or send it privately if you prefer). Anyway, *please* remove any possible password or sensible information from the config if you send it. Willy
Re: maintenance mode and server affinity
On Tue, Aug 2, 2011 at 2:44 PM, Willy Tarreau w...@1wt.eu wrote: OK thanks for confirming. Could you check if you have option persist somewhere in your config ? From what I can tell from the code, this is the only reason why a server set in maintenance mode would be selected : if ((srv-state SRV_RUNNING) || (px-options PR_O_PERSIST) || (s-flags SN_FORCE_PRST)) { s-flags |= SN_DIRECT | SN_ASSIGNED; set_target_server(s-target, srv); } - the server does not have the SRV_RUNNING flag in maintenance mode - the persist option on the backend might be one reason - I'm assuming there is no force-persist rule OK, that's it. I didn't realize that was the same code path for manually disabled servers. I had option persist in there to prevent a server that misses a few healthchecks under load from dumping all it's clients. Graceful maintenance is more important than this edge case though, so I'll remove it. Thanks! -jim
Re: maintenance mode and server affinity
On Tue, Aug 02, 2011 at 03:08:32PM -0400, James Bardin wrote: - the server does not have the SRV_RUNNING flag in maintenance mode - the persist option on the backend might be one reason - I'm assuming there is no force-persist rule OK, that's it. Fine! I didn't realize that was the same code path for manually disabled servers. I had option persist in there to prevent a server that misses a few healthchecks under load from dumping all it's clients. Graceful maintenance is more important than this edge case though, so I'll remove it. If you want a server that misses a few health check to remain up, then simply increase its fall parameter :-) Thanks for the quick reply ! Willy
HAProxy backup servers still receive requests?
Hi, Referring to this thread http://comments.gmane.org/gmane.comp.web.haproxy/6037, I have modified my haproxy configuration by setting up a couple of my jails as a backup server. I converted them as backups due to hundreds of errors reported in the haproxy stats page. But I wonder why I can still see a number of errors reported, even after configuring them as backup? I thought no request should go to backup servers. Here's a screenshot of my haproxy stats page. This was taken last night, and the numbers have slightly increased today. http://people.freebsd.org/~miwi/ha.png Thanks.
Re: HAProxy backup servers still receive requests?
On Wed, Aug 03, 2011 at 01:14:52PM +0800, Gi Dot wrote: Hi, Referring to this thread http://comments.gmane.org/gmane.comp.web.haproxy/6037, I have modified my haproxy configuration by setting up a couple of my jails as a backup server. I converted them as backups due to hundreds of errors reported in the haproxy stats page. But I wonder why I can still see a number of errors reported, even after configuring them as backup? I thought no request should go to backup servers. A backup server can take requests which contain a reference to their cookie if no other server is available for this cookie. It is possible that this is what's happening. Maybe you changed their state and restarted haproxy, so the users which were processed by these servers finish their work there and no new user is assigned to them, which seems to be true given the ratio between their usage and the other ones. Here's a screenshot of my haproxy stats page. This was taken last night, and the numbers have slightly increased today. http://people.freebsd.org/~miwi/ha.png If the numbers still increase, it means that some users have kept their browser open and continue to address the same server. Against this, I invite you to take a look at the maxlife cookie option which says that a too old cookie will be ignored and the user will be redispatched anyway. Similarly there's the maxidle parameter which does the same but only during idle periods. The values are very dependant on the site, but some might want to have maxidle 1h maxlife 1d so that cookies from browsers that remained idle more than 1 hour are ignored, and cookies assigned more than 1 day ago are ignored. If your issue with those servers is temporary, you can also force them into maintenance from the stats socket, or you can use the disabled setting on the server line, which really disables them, as opposed to the backup mode where they're supposed to still take some specific traffic. Regards, Willy