Re: haproxy API patch

2011-08-28 Thread Willy Tarreau
Hi Jeff,

sorry for the late response, your message is one of the few I found
unread in my mail box after moving a lot of ML junk out of it.

On Fri, Aug 19, 2011 at 09:05:53AM -0400, Jeff Buchbinder wrote:
 The API stats (pool.content, etc) calls that I had implemented are
 essentially the same, except that they format the data in a way that is
 much more easily consumed by other services. (JSON formatted data beats
 out CSV in an ease-of-use context for anything other than feeding to a
 spreadsheet or awk/sed/cut/grepping data.)

I'm well aware of this too. That's why I wanted that we used JSON in our
API at Exceliance.

(...)
= This means that you need your file-based config to always be in
   sync with the API changes. There is no reliable way of doing so
   in any component if the changes are applied at two distinct
   places at the same time !
 
 It depends what you're using haproxy for. If you're populating the
 configuration from the API (which is my eventual goal, if possible) for
 an elastic/dynamic server pool scenario where servers will be brought
 into the pool dynamically, it doesn't matter as much about configuration
 file persistence.

But you still need to populate your config before starting the daemon,
otherwise a restart may be fatal just because the few first seconds
before you update its conf break the site.

(...)
  There is only one way to solve these classes of issues, by respecting those
  two rules :
- the changes must be performed to one single place, which is the 
  reference
  (here the config file)
- the changes must then be applied using the normal process from this
  reference
 
 I would think it would also be possible to replay a list of
 modifications to the original configuration, which would not require
 rewriting the original config. Not a perfect solution, but another
 possibility. (The downside would potentially be that a change to the
 original configuration would change the way that the replayed actions
 would behave.)

Yes that's the problem. Replaying is only valid in an independant context.
That's the problem we have with the defaults sections. They're quite handy
but they're changing a lot of semantics when it comes to configuring the
sections that depend on them. If your main config file gets a change, it's
very possible that replaying your changes will not do the right thing again.

  What this means is that anything related to changing more than an 
  operational
  status must be performed on the config file first, then propagated to the
  running processes using the same method that is used upon start up (config
  parsing and loading).
 
 That assumes that you're not dealing with a transient configuration (as
 I had mentioned earlier. It's an admirable goal to allow configuration
 persistence for things like the pool.add and pool.remove methods (since
 those are, at the moment, the only two that touch the configuration in a
 way that would seriously break a stored config file).

As I indicated above, the idea of a transient config file scares me a lot.
Either you have no server in it and you serve 503 errors to everyone when
you start, until the config is updated, or you have a bunch of old servers
and in environments such as EC2, you send traffic to someone else's servers
because they were assigned your previous IP.

 Also, outside of pool.add and pool.remove, I'm not really doing anything
 conceptually outside of what the stats control socket already has been
 doing. Weight and maintenance mode are not persisted to the
 configuration file. The only difference is the way that I'm allowing
 access to it (disregarding pool.add and pool.remove, of course).

Even the weight has different semantics in the config file and on the stats
socket. The stats socket controls the effective weight without affecting
the configured weight. The reason is that you can set the weight to 100%
on the stats socket and you get back the configured weight.

  Right now haproxy is not able to reload a config once it's started. And 
  since
  we chroot it, it will not be able to access the FS afterwards. However we 
  can
  reload a new process with the new config (that's what most of us are 
  currently
  doing).  
 
 That's also what I'm doing in our production setup. The importance of an
 accessible API, though, is that it allows third party services (for
 example, a software deployer or cloud management service) to control
 certain aspects of the proxy without having to resort to kludges like
 using ssh to remotely push commands into a socket with socat. (Which, by
 the way, works just fine run locally with a wrapper script, but makes it
 more difficult to integrate into a deployment process.)

Oh I know that well too ;-)
At the company, we decided to address precisely this issue with the API we
developped : it only affects the config file and never plays with the socket
because right now we have not implemented any operational status changes.

Re: haproxy API patch

2011-08-18 Thread Willy Tarreau
Hi Jeff,

On Sun, Aug 14, 2011 at 04:01:53PM -0400, Jeff Buchbinder wrote:
 I've been working on an API patch, where certain functionality is
 exposed over the stats HTTP service. The fork where I have been
 working on this is available here:
 
 https://github.com/jbuchbinder/haproxy
 
 The full patch (which I imported from my old working tree) is here, if
 you want to see just the changes:
 
 https://github.com/jbuchbinder/haproxy/commit/0f924468977fc71f2530837e3e44cf47fc00fd0f
 
 Documentation is available here:
 
 https://github.com/jbuchbinder/haproxy/blob/master/README.API
 
 It was recently suggested that I attempt to get this patch included
 upstream.

Well, you apparently did a nice amount of work. I'm not opposed to an API
(after all, we've developped one at Exceliance too), but we need to respect
a certain amount of basic component behaviour rules. An API will never do
more than what the component itself is able to do, it's just a convenient
(or sometimes at least a different) way of making it do something it is
able to do.

If you look at how other load balancers work (and most network equipments
too BTW), you generally have multiple interaction levels between the user
and the internal state :

  - monitoring : the user wants to check the current state of the product ;
  - stats : the user wants to check some stats that were aggregated over
a period, sometimes since last clear or last reboot. Those generally
are counters ;
  - operational status : the user wants to temporarily change something
for the current running session, because this can help him make some
other operations more transparent or better resist an unexpected
condition (eg: imbalanced servers after a failure).
  - configuration status : the user wants a change to be performed and
kept until a new configuration change undoes it.

For quite some time, monitoring and stats have been available under various
forms (http or unix socket). Recently, a few operational status changes were
brought first on the unix socket and later on the web page. Those are still
limited (enable/disable of a server, clear a table entry, change a server's
weight) and even more limited for the web access. Still that starts to fit
some usages.

All of these accesses are performed for the currently running session. This
means that if you restart the process, everything is lost, quite similarly
as what you get if you reboot your router while you temporarily killed a
BGP session or shut a port. And this is expected, you want those changes
to be temporary because they're made in the process of something else.

The configuration status has a very different usage pattern : the change
that is performed must absolutely meet two important requirements :
  - the changes that are performed must be kept across a restart ;
  - what is performed must have the same effect after the restart that
it had during the hot change.

The first one is crucial : if your process dies and is restarted by a
monitoring tool without the user knowing it, all changes are lost and
nobody knows. Also, the process would restart with an old invalid config
which does not match what was running before the restart, until someone
pushes the changes again (provided someone is able to determine the diff
between what's in the file at the moment of restart and what was running
before it). Worse, some orthogonal changes may be performed live and in
the config file, making the addition of both incompatible. For instance,
you would add a server on the live API and in parallel, someone would
add the cookie from the config as well as to all other servers. If after
a restart you re-apply the same changes, you'll get a wrong config with
the last added server which does not have any cookie.

  = This means that you need your file-based config to always be in
 sync with the API changes. There is no reliable way of doing so
 in any component if the changes are applied at two distinct
 places at the same time !

The second point is important too : even if we assume that you find a way
to more-or-less ensure that your config file gets the equivalent changes
and is up to date, you must absolutely ensure that what is there will work
upon a restart and will exhibit the exact same behaviour.

There are a large number of issues that can arise from performing changes
in a different order than what is done at once upon start-up. Most people
who had to deal with Alteon LBs for instance, know that sometimes something
does not behave as expected after a change and a reboot fixes the issue
(eg: renumbering a large number of filters, or changing health checking
methods). And there's nothing really wrong with that, it's just that the
problem by itself is complex. On unix-like systems, many of us have already
been hit by an issue involving two services bound to the same port, one on
the real IP and the other one bound to 0.0.0.0. If you bind 0.0.0.0 first,
on most systems both may bind, 

Re: haproxy API patch

2011-08-15 Thread Brane F. Gračnar
On Sunday 14 of August 2011 22:01:53 Jeff Buchbinder wrote:
 I've been working on an API patch, where certain functionality is
 exposed over the stats HTTP service. The fork where I have been
 working on this is available here:
 
 https://github.com/jbuchbinder/haproxy
 
 The full patch (which I imported from my old working tree) is here, if
 you want to see just the changes:
 
 https://github.com/jbuchbinder/haproxy/commit/0f924468977fc71f2530837e3e44c
 f47fc00fd0f
 
 Documentation is available here:
 
 https://github.com/jbuchbinder/haproxy/blob/master/README.API
 

Don't get me wrong, but i think that it would be much better to implement api 
on top of unix socket statistics interface. I'm already working on it and it 
will feature strong encryption, authentication.

Just my opinion. But you're right, we need RESTful api for haproxy.

Best regards, Brane



haproxy API patch

2011-08-14 Thread Jeff Buchbinder
I've been working on an API patch, where certain functionality is
exposed over the stats HTTP service. The fork where I have been
working on this is available here:

https://github.com/jbuchbinder/haproxy

The full patch (which I imported from my old working tree) is here, if
you want to see just the changes:

https://github.com/jbuchbinder/haproxy/commit/0f924468977fc71f2530837e3e44cf47fc00fd0f

Documentation is available here:

https://github.com/jbuchbinder/haproxy/blob/master/README.API

It was recently suggested that I attempt to get this patch included
upstream.

-- 
Jeff Buchbinder
Principal Engineer / Interim Director of Infrastructure
Rave Mobile Safety, Inc
m: 860.838.3355
jbuchbin...@ravemobilesafety.com