Re: [openstack-dev] [Octavia] Question about where to render haproxy configurations

Eichberger, German Sat, 06 Sep 2014 21:56:20 -0700

Hi Steven,

Thanks for taking the time to lay out the components clearly. I think we are 
pretty much on the same page ☺


Driver vs, Driver-less
I strongly believe that REST is a cleaner interface/integration point – but  if 
even Brandon believes that drivers are the better approach (having suffered 
through the LBaaS v1 driver world which is not an advertisement for this 
approach) I will concede on that front. Let’s hope nobody makes an asynchronous 
driver and/or writes straight to the DB ☺ That said I still believe that adding 
the driver interface now will lead to some more complexity and I am not sure we 
will get the interface right in the first version: so let’s agree to develop 
with a driver in mind but don’t allow third party drivers before the interface 
has matured. I think that is something we already sort of agreed to, but I just 
want to make that explicit.

Multiple drivers/version for the same Controller
This is a really contentious point for us at HP: If we allow say drivers or 
even different versions of the same driver, e.g. A, B, C to run in parallel, 
testing will involve to test all the possible (version) combination to avoid 
potential side effects. That can get extensive really quick. So HP is 
proposing, given that we will have 100s of controllers any way, to limit the 
number of drivers per controller to 1 to aide testing. We can revisit that at a 
future time when our testing capabilities have improved but for now I believe 
we should choose that to speed things up. I personally don’t see the need for 
multiple drivers per controller – in an operator grade environment we likely 
don’t need to “save” on the number of controllers ;-) The only reason we might 
need two drivers on the same controller is if an Amphora for whatever reason 
needs to be talked to by two drivers. (e.g. you install nginx and haproxy  and 
have a driver for each). This use case scares me so we should not allow it.
We also see some operational simplifications from supporting only one driver 
per controller: If we have an update for driver A we don’t need to touch any 
controller running Driver B. Furthermore we can keep the old version running 
but make sure no new Amphora gets scheduled there to let it wind down with 
attrition and then stop that controller when it doesn’t have any more Amphora 
to serve.

Lastly, I interpreted the word “VM driver” in the spec along the lines what we 
have in libra: A driver interface on the Amphora agent that abstracts 
starting/stopping the haproxy if we end up on some different and abstracts 
writing the haproxy file. But that is for the agent on the Amphora. I am sorry 
I got confused  that way when reading the 0.5 spec and I am therefore happy we 
can have that discussion to make things more clear.

German

From: Stephen Balukoff [mailto:sbaluk...@bluebox.net]
Sent: Friday, September 05, 2014 6:26 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Octavia] Question about where to render haproxy 
configurations

Hi German,

Responses in-line:

On Fri, Sep 5, 2014 at 2:31 PM, Eichberger, German 
<german.eichber...@hp.com<mailto:german.eichber...@hp.com>> wrote:
Hi Stephen,

I think this is a good discussion to have and will make it more clear why we 
chose a specific design. I also believe by having this discussion we will make 
the design stronger.  I am still a little bit confused what the 
driver/controller/amphora agent roles are. In my driver-less design we don’t 
have to worry about the driver which most likely in haproxy’s case will be 
split to some degree between controller and amphora device.

Yep, I agree that a good technical debate like this can help both to get many 
people's points of view and can help determine the technical merit of one 
design over another. I appreciate your vigorous participation in this process. 
:)

So, the purpose of the controller / driver / amphora and the responsibilities 
they have are somewhat laid out in the Octavia v0.5 component design document, 
but it's also possible that there weren't enough specifics in that document to 
answer the concerns brought up in this thread. So, to that end in my mind, I 
see things like the following:

The controller:
* Is responsible for concerns of the Octavia system as a whole, including the 
intelligence around interfacing with the networking, virtualization, and other 
layers necessary to set up the amphorae on the network and getting them 
configured.
* Will rarely, if ever, talk directly to the end-systems or -services (like 
Neutron, Nova, etc.). Instead it goes through a "clean" driver interface for 
each of these.
* The controller has direct access to the database where state is stored.
* Must load at least one driver, may load several drivers and choose between 
them based on configuration logic (ex. flavors, config file, etc.)

The driver:
* Handles all communication to or from the amphorae
* Is loaded by the controller (ie. its interface with the controller is a base 
class, associated methods, etc. It's objects and code, not a RESTful API.)
* Speaks amphora-specific protocols on the back-end. In the case of the 
reference "haproxy" amphora, this will most likely be in the form of a RESTful 
API with an agent on the amp, as well as (probably) HMAC-signed UDP health, 
status and stats messages from the amp to the driver.

The amphora:
* Does the actual load balancing
* Is managed by the controller through the driver.
* Should be as "dumb" as possible.
* Comes in different types, based on the software in the amphora image. (Though 
all amps of a given type should be managed by the same driver.) Types might 
include "haproxy," "nginx," "haproxy + nginx," "3rd party vendor X," etc.
* Should never have direct access to the Octavia database, and therefore 
attempt to be as stateless as possible, as far as configuration is concerned.

To be honest, our current product does not have a "driver" layer per se, since 
we only interface with one type of back-end. However, we still render our 
haproxy configs in the controller. :)


So let’s try to sum up what we want a controller to do:

-          Provision new amphora devices

-          Monitor/Manage health

-          Gather stats

-          Manage/Perform configuration changes

The driver as described would be:

-          Render configuration changes in a specific format, e.g. haproxy

Amphora Device:

-          Communicate with the driver/controller to make things happen

So as Doug pointed out I can make a very thin driver which basically passes 
everything through to the Amphora Device or on the other hand of the spectrum I 
can make a very thick driver which manages all aspects from the amphora life 
cycle to whatever (aka kitchen sink). I know we are going for uttermost 
flexibility but I believe:

So, I'm not sure it's fair to characterize the driver I'm suggesting as "very 
thick." If you get right down to it, I'm pretty sure the only major thing we 
disagree on here is where the haproxy configuration is rendered:  Just before 
it's sent over the wire to the amphora, or just after it's JSON-equivalent is 
received over the wire from the controller.


-          With building an haproxy centric controller we don’t really know 
which things should be controller/which thing should be driver. So my shortcut 
is not to build a driver at all ☺
So, I've become more convinced that having a driver layer there is going to be 
important if we want to support 3rd party vendors creating their own amphorae 
at all (which I think we do). It's also going to be important if we want to be 
able to support other versions of open-source amphorae (or experimental 
versions prior to pushing out to a wider user-base, etc.)

Also, I think: Making ourselves use a driver here also helps keep interfaces 
clean. This helps us avoid spaghetti code and makes things more maintainable in 
the long run.

-          The more flexibility increases complexity and makes it confusing for 
people to develop components. Should this concern go into the controller, the 
driver, or the amphora VM? Two of them? Three of them? Limiting choices makes 
it simpler to achieve that.
"Centralize intelligence / decentralize workload."  There will often be 
multiple ways we can solve certain problems, but if we try to follow this 
mantra, and use clean interfaces between components, it starts to become more 
clear which code strategies we should be following. Yes, it's sometimes hard to 
know the right way to do things-- which is why we end up having these wonderful 
debates. ;) But I don't think the answer is "this is hard, let's just lump 
everything together."

Also, rule of thumb (perhaps not stated in our constitution... yet):  Try to 
architect things so the most frequently deployed elements see the fewest 
changes. (This is actually related to the "centralize intelligence / 
decentralize workload" mantra in a round-about way: Central intelligence 
elements will be both fewer in number and more frequently changed than "dumb" 
workload components.) This makes managing change for large deployments easier. 
(Again, it's both easier and less risky to update 100 controllers versus 
10,000+ amphorae.)


HPs worry is that by creating the potential to run multiple (version of 
drivers) drivers, on multiple versions of controllers, on multiple versions of 
amphora devices creates a headache for testing. For example does the version 
4.1 haproxy driver work with the cersion 4.2 controller on an 4.0 amphora 
device? Which compatibility matrix do we need to build/test? Limiting one 
driver to one controller can help with making that manageable.

Ok, so, I think this is possibly where part of our misunderstanding comes from. 
I realize above that I said a single driver could talk to multiple versions of 
back-end amphorae via a couple methods, but let's ignore that for a minute and 
assume that we only test / assume drivers will be speaking with the latest 
version of the amphorae to which they correspond.

I should probably clarify something that I've been assuming but may not be 
obvious:  I'm assuming that the "version" of the amphorae (drawn mostly from 
the version of the glue scripts, agent, and other code we write which lives on 
the amphora) is numbered separately and moves at a different rate than the 
version of the driver.  Think of this like the version of the firmware and 
version of the driver used with your printer. Sometimes a major bugfix entails 
updating both the firmware and driver. However, it's also common for a bugfix / 
feature enhancement to involve only updating the printer driver version and not 
the printer firmware.

What I'm getting at here is that if we're doing the configuration rendering in 
the driver and not on the amphora, there will be some bugfixes / feature 
enhancements which only entail updating the driver because there are literally 
no changes that need to be made to the amphora for the bugfix / feature.

Does this actually happen? Yes! To give a concrete example drawn from our 
product history:  On our existing load balancer product, which is powered by 
stunnel + haproxy a new OpenSSL vulnerability was discovered, the fix for which 
was to add a line to the stunnel configuration disabling a certain kind of SSL 
negotiation. Since we were rendering configurations centrally on our 
controllers, all we needed to do was update the configuration template on our 
controller and push out new configs for anyone using SSL termination. Took 
literally 10 minutes to implement once we understood the problem, and we didn't 
have to touch or otherwise update the software or scripts running on our 
appliances at all.

It's even easier for L7 feature enhancements: You don't even have to push 
anything out to the amphora, just update the controller / driver to expose the 
new feature and users can then start using it at will.

Are all feature enhancements / bugfixes this easy? No! How do you tell the 
difference between which changes are major and minor? Anything which touches 
the code running on the amphora is "major" (ie. like a firmware update). 
Anything which only touches the controller / driver is "minor" (ie. like a 
driver update).

It seems strange to me that we'd force even minor changes to configurations to 
be "major" updates for the sake of sending 
JSON-which-will-immediately-be-turned-into-haproxy.cfg over the wire instead of 
just the haproxy.cfg. :/

So with that in mind:  Please understand that your model and mine do not have 
to differ in the slightest when it comes to how to manage 'major' updates, 
whether that be running a different driver / controller for the new amphora 
version (Ick!), or doing on-demand lazy upgrades of amphora as the driver 
discovers old, incompatible-versioned amphora it needs to update (probably 
smoothest way to handle this, possibly as a default action of the option 2 I 
mentioned above), or whether we force all amphora to be updated as soon as 
possible after a controller update (most risky and probably not the best way to 
handle this). We've yet to define exactly how this workflow should be handled, 
but it's actually somewhat secondary to the problem of where to render the 
configs.  (Maybe we should have a conversation about this in another thread?)

And in any case, I'm not seeing a need to ensure the driver works with anything 
but the latest amphora image version to which it corresponds (again, keeping in 
mind that amphora image and driver should be allowed to change at different 
rates and are therefore versioned separately). :/ This is especially the case 
if we define the default action to be taken upon a failure to push out a new 
config to be to check the version of the amphora and upgrade as necessary (ie. 
lazy upgrading)...

Also, not that we can't revisit this of course:  But the v0.5 component design 
entailing a "VM Driver" already went through gerrit review and was approved (by 
yourself even!) This discussion was originally about where to render the 
haproxy configs, but it really seems like y'all are against the idea of having 
an amphora driver interface at all. :/

Stephen



--
Stephen Balukoff
Blue Box Group, LLC
(800)613-4305 x807<tel:%28800%29613-4305%20x807>

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Octavia] Question about where to render haproxy configurations

Reply via email to