Re: haproxy API patch

2011-08-28 Thread Willy Tarreau
Hi Jeff,

sorry for the late response, your message is one of the few I found
unread in my mail box after moving a lot of ML junk out of it.

On Fri, Aug 19, 2011 at 09:05:53AM -0400, Jeff Buchbinder wrote:
 The API stats (pool.content, etc) calls that I had implemented are
 essentially the same, except that they format the data in a way that is
 much more easily consumed by other services. (JSON formatted data beats
 out CSV in an ease-of-use context for anything other than feeding to a
 spreadsheet or awk/sed/cut/grepping data.)

I'm well aware of this too. That's why I wanted that we used JSON in our
API at Exceliance.

(...)
= This means that you need your file-based config to always be in
   sync with the API changes. There is no reliable way of doing so
   in any component if the changes are applied at two distinct
   places at the same time !
 
 It depends what you're using haproxy for. If you're populating the
 configuration from the API (which is my eventual goal, if possible) for
 an elastic/dynamic server pool scenario where servers will be brought
 into the pool dynamically, it doesn't matter as much about configuration
 file persistence.

But you still need to populate your config before starting the daemon,
otherwise a restart may be fatal just because the few first seconds
before you update its conf break the site.

(...)
  There is only one way to solve these classes of issues, by respecting those
  two rules :
- the changes must be performed to one single place, which is the 
  reference
  (here the config file)
- the changes must then be applied using the normal process from this
  reference
 
 I would think it would also be possible to replay a list of
 modifications to the original configuration, which would not require
 rewriting the original config. Not a perfect solution, but another
 possibility. (The downside would potentially be that a change to the
 original configuration would change the way that the replayed actions
 would behave.)

Yes that's the problem. Replaying is only valid in an independant context.
That's the problem we have with the defaults sections. They're quite handy
but they're changing a lot of semantics when it comes to configuring the
sections that depend on them. If your main config file gets a change, it's
very possible that replaying your changes will not do the right thing again.

  What this means is that anything related to changing more than an 
  operational
  status must be performed on the config file first, then propagated to the
  running processes using the same method that is used upon start up (config
  parsing and loading).
 
 That assumes that you're not dealing with a transient configuration (as
 I had mentioned earlier. It's an admirable goal to allow configuration
 persistence for things like the pool.add and pool.remove methods (since
 those are, at the moment, the only two that touch the configuration in a
 way that would seriously break a stored config file).

As I indicated above, the idea of a transient config file scares me a lot.
Either you have no server in it and you serve 503 errors to everyone when
you start, until the config is updated, or you have a bunch of old servers
and in environments such as EC2, you send traffic to someone else's servers
because they were assigned your previous IP.

 Also, outside of pool.add and pool.remove, I'm not really doing anything
 conceptually outside of what the stats control socket already has been
 doing. Weight and maintenance mode are not persisted to the
 configuration file. The only difference is the way that I'm allowing
 access to it (disregarding pool.add and pool.remove, of course).

Even the weight has different semantics in the config file and on the stats
socket. The stats socket controls the effective weight without affecting
the configured weight. The reason is that you can set the weight to 100%
on the stats socket and you get back the configured weight.

  Right now haproxy is not able to reload a config once it's started. And 
  since
  we chroot it, it will not be able to access the FS afterwards. However we 
  can
  reload a new process with the new config (that's what most of us are 
  currently
  doing).  
 
 That's also what I'm doing in our production setup. The importance of an
 accessible API, though, is that it allows third party services (for
 example, a software deployer or cloud management service) to control
 certain aspects of the proxy without having to resort to kludges like
 using ssh to remotely push commands into a socket with socat. (Which, by
 the way, works just fine run locally with a wrapper script, but makes it
 more difficult to integrate into a deployment process.)

Oh I know that well too ;-)
At the company, we decided to address precisely this issue with the API we
developped : it only affects the config file and never plays with the socket
because right now we have not implemented any operational status changes.

[Proposal] Concurrency tuning by adding a limit to http-server-close

2011-08-28 Thread Cyril Bonté
Hi Willy and the list,

I couldn't find time for haproxy for some weeks. Now I'm on holidays, I try to 
review some patches I had on my test machine.
One of them is the possibility to limit the number of HTTP keep-alive 
connections to allow a better concurrency between clients.

I propose to add a suboption to the http-server-close one to let haproxy 
fall back to a httpclose mode once a certain number of connections on the 
frontend is reached.
The value can be defined :
- as an absolute limit
  Example :
maxconn 1000
option http-server-close limit 500

- or as a percent of the frontend maxconn
  Example :
maxconn 1000
option http-server-close limit 75%

Let me illustrate the benefits, sorry if it's a bit long to read ;-)

* THE CONFIGURATION

First, I used this configuration :
(maxconn values were set to 150 to ease the tests on a laptop that was not 
tuned for high # of connections)
global
log localhost local7 debug err

defaults
timeout server 60s
timeout client 60s
timeout connect 5s
timeout http-keep-alive 5s
log global
option httplog

listen scl-without-limit
bind :8000
maxconn 150
mode http
option http-server-close
capture request header User-Agent len 5
server local 127.0.0.1:80 maxconn 150

listen close
bind :8001
maxconn 150
mode http
option httpclose
capture request header User-Agent len 5
server local 127.0.0.1:80 maxconn 150

listen scl-with-limit-75pct
bind :8002
maxconn 150
mode http
option http-server-close limit 75%
capture request header User-Agent len 5
server local 127.0.0.1:80 maxconn 150

listen scl-with-limit-95pct
bind :8003
maxconn 150
mode http
option http-server-close limit 95%
capture request header User-Agent len 5
server local 127.0.0.1:80 maxconn 150

listen scl-with-limit-50pct
bind :8004
maxconn 150
mode http
option http-server-close limit 50%
capture request header User-Agent len 5
server local 127.0.0.1:80 maxconn 150

listen scl-with-limit-25pct
bind :8005
maxconn 150
mode http
option http-server-close limit 25%
capture request header User-Agent len 5
server local 127.0.0.1:80 maxconn 150

And I defined a test URL that waits some times before replying (100ms in this 
tests).

* THE SCENARIO

The scenario I used is :
ab -H User-Agent: test1 -n1 -c150 -k http://localhost:port/ 
sleep 1
ab -H User-Agent: test2 -n1 -c150 -k http://localhost:port/ 
sleep 1
curl -H User-Agent: test3 http://localhost:port/

and as soon as each ab instances are done, I launch a final ab test to 
compare :
ab -H User-Agent: test4 -n1 -c150 -k http://localhost:port/

I've written a log analyzer to sum up the scenario execution, second by 
second.
For each test, it shows :
- the HTTP keep-alive efficiency
- when the test could really obtain its first response (the '|' characters 
indicates that the test is started but is waiting for a connection).
- how long the test runned to obtain the last response
and the global keep-alive efficiency measured.

* USING option http-server-close

Let's see what happens with this scenario when we use the current
http-server-close option :

Date  Frontend  {test1}  {test2}  {test3}  {test4}  Global
00:00:00  scl-without-limit 100 100
00:00:01  scl-without-limit 100  |  100
00:00:02  scl-without-limit 100  || 100
00:00:03  scl-without-limit 100  || 100
00:00:04  scl-without-limit 100  || 100
00:00:05  scl-without-limit 100  || 100
00:00:06  scl-without-limit 100  || 100
00:00:07  scl-without-limit 100  || 100
00:00:08  scl-without-limit 100  || 100
00:00:19  scl-without-limit 100  || 100
00:00:10  scl-without-limit 100  || 100
00:00:11  scl-without-limit 100  || 100
00:00:12  scl-without-limit 100  || 100
00:00:13  scl-without-limit  100  | 100
00:00:14  scl-without-limit  100  | 100
00:00:15  scl-without-limit  100  | 100
00:00:16  scl-without-limit  100  | 100
00:00:17  scl-without-limit  100  | 100
00:00:18  scl-without-limit  100  | 100
00:00:19  scl-without-limit  100  | 100
00:00:20  scl-without-limit  100  | 100