Re: [openstack-dev] [Neutron][LBaaS] Fulfilling Operator Requirements: Driver / Management API

Ed Hall Fri, 02 May 2014 14:29:24 -0700

Hi all,

At Yahoo, load balancing is heavily used throughout our stack for both HA and
load distribution, even within the OpenStack control plane itself. This 
involves a
variety of technologies, depending on scale and other requirements. For large
scale + L7 we use Apache Traffic Server, while L3DSR is the mainstay of the
highest bandwidth applications and a variety of technologies are used for simple
HA and lighter loads.


Each of these technologies has its own special operational requirements, and 
although
a single well-abstracted tenant-facing API to control all of them is much to be 
desired,
there can be no such luck for operators. A major concern for us is insuring 
that when a
tenant* has an operational issue they can communicate needs and concerns with
operators quickly and effectively. This means that any operator API must “speak 
the
same language” as the user API while exposing the necessary information and 
controls
for the underlying technology.

*In this case a “tenant” might represent a publicly-exposed URL with tens of 
millions of
users or an unexposed service which could impact several such web destinations.

                      -Ed


On May 2, 2014, at 9:34 AM, Eichberger, German 
<german.eichber...@hp.com<mailto:german.eichber...@hp.com>> wrote:

Hi Stephen + Adam,

Thanks Stephen and Adam for starting this discussion. I also see several 
different drivers. We at HP indeed use a pool of software load balancing 
appliances to replace any failing one. However, we are also interested in a 
model where we have load balancers in hot standby…

My hope with this effort is that we can somehow reuse the haproxy 
implementation and deploy it different ways depending on the necessary 
scalability, availability needs. Akin to creating a strategy which deploys the 
same haproxy control layer in a pool, on  nova vm, etc.

German


From: Stephen Balukoff [mailto:sbaluk...@bluebox.net]
Sent: Thursday, May 01, 2014 7:44 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Neutron][LBaaS] Fulfilling Operator Requirements: 
Driver / Management API

Hi Adam,

Thank you very much for starting this discussion!  In answer do your questions 
from my perspective:

1. I think that it makes sense to start at least one new driver that focuses on 
running software virtual appliances on Nova nodes (the NovaHA you referred to 
above). The existing haproxy driver should not go away as I think it solves 
problems for small to medium size deployments, and does well for setting up, 
for example, a 'development' or 'QA' load balancer that won't need to scale, 
but needs to duplicate much of the functionality of the production load 
balancer(s).

On this note, we may want to actually create several different drivers 
depending on the appliance model that operators are using. From the discussion 
about HA that I started a couple weeks ago, it sounds like HP is using an HA 
model that concentrates on pulling additional instances from a waiting pool. 
The stingray solution you're using sounds like "raid 5" redundancy for load 
balancing. And what we've been using is more like "raid 1" redundancy.

It probably makes sense to collaborate on a new driver and model if we agree on 
the topologies we want to support at our individual organizations. Even if we 
can't agree on this, it still makes sense for us to collaborate on determining 
that "basic set of operator features" that all drivers should support, from an 
operator perspective.

I think a management API is necessary--  operators and their support personnel 
need to be able to troubleshoot problems down to the device level, and I think 
it makes sense to do this through an OpenStack interface if possible. In order 
to accommodate each vendor's differences here, though, this may only be 
possible if we allow for different drivers to expose "operator controls" in 
their own way.

I do not think any of this should be exposed to the user API we have been 
discussing.

I think it's going to be important to come to some kind of agreement on the 
user API and object model changes before it's going to be possible to start to 
really talk about how to do the management API.

I am completely on board with this! As I have said in a couple other places on 
this list, Blue Box actually wrote our own software appliance based load 
balancing system based on HAProxy, stunnel, corosync/pacemaker, and a series of 
glue scripts (mostly written in perl, ruby, and shell) that provide a "back-end 
API" and whatnot. We've actually done this (almost) from scratch twice now, and 
have plans and some work underway to do it a third time-- this time to be 
compatible with OpenStack (and specifically the Neutron LBaaS API, hopefully as 
a driver for the same). This will be completely open source, and hopefully 
compliant with OpenStack standards (equivalent licensing, everything written in 
python, etc.)  So far, I've only had time to port over the back-end API and a 
couple design docs, but if you want to see what we have in mind, here's the 
documentation on this so far:

https://github.com/blueboxgroup/octavia/

In particular, probably the theory of operation document will give you the best 
overview of how it works:

https://github.com/blueboxgroup/octavia/blob/master/doc/theory-of-operation.md

And the virtual appliance API (as it was two months ago. Some things will 
definitely change based on discussions of the last couple months):
https://github.com/blueboxgroup/octavia/blob/master/doc/virtual-appliance-api.md

Thanks,
Stephen


On Thu, May 1, 2014 at 2:33 PM, Adam Harwell 
<adam.harw...@rackspace.com<mailto:adam.harw...@rackspace.com>> wrote:
I am sending this now to gauge interest and get feedback on what I see as an 
impending necessity — updating the existing "haproxy" driver, replacing it, or 
both. Though we're not there yet, it is probably best to at least start the 
discussion now, to hopefully limit some fragmentation that may be starting 
around this concept already.

To begin with, I should probably define some terms. Following is a list of the 
major things I'll be referencing and what I mean by them, since I would like to 
avoid ambiguity as much as possible.

----------------------------------
---- Glossary
----------------------------------
HAProxy: This references two things currently, and I feel this is a source of 
some misunderstanding. When I refer to  HAProxy (capitalized), I will be 
referring to the official software package (found here: http://haproxy.1wt.eu/ 
), and when I refer to "haproxy" (lowercase, and in quotes) I will be referring 
to the neutron-lbaas driver (found here: 
https://github.com/openstack/neutron/tree/master/neutron/services/loadbalancer/drivers/haproxy
 ). The fact that the neutron-lbaas driver is named directly after the software 
package seems very unfortunate, and while it is not directly in the scope of 
what I'd like to discuss here, I would love to see it changed to more 
accurately reflect what it is --  one specific driver implementation that 
coincidentally uses HAProxy as a backend. More on this later.

Operator Requirements: The requirements that can be found on the wiki page 
here:  
https://wiki.openstack.org/wiki/Neutron/LBaaS/requirements#Operator_Requirements
 and focusing on (but not limited to) the following list:
* Scalability
* DDoS Mitigation
* Diagnostics
* Logging and Alerting
* Recoverability
* High Availability (this is in the User Requirements section, but will be 
largely up to the operator to handle, so I would include it when discussing 
Operator Requirements)

Management API: A restricted API containing resources that Cloud Operators 
could access, including most of the list of Operator Requirements (above).

Load Balancer (LB): I use this term very generically — essentially a logical 
entity that represents one "use case". As used in the sentence: "I have a Load 
Balancer in front of my website." or "The Load Balancer I set up to offload SSL 
Decryption is lowering my CPU load nicely."

----------------------------------
---- Overview
----------------------------------
What we've all been discussing for the past month or two (the API, Object 
Model, etc) is being directly driven by the User and Operator Requirements that 
have somewhat recently been enumerated (many thanks to everyone who has 
contributed to that discussion!). With that in mind, it is hopefully apparent 
that the current API proposals don't directly address many (or really, any) of 
the Operator requirements! Where in either of our API proposals are logging, 
high availability, scalability, DDoS mitigation, etc? I believe the answer is 
that none of these things can possibly be handled by the API, but are really 
implementation details at the driver level. Radware, NetScaler, Stingray, F5 
and HAProxy of any flavour would all have very different ways of handling these 
things (these are just some of the possible backends I can think of). At the 
end of the day, what we really have are the requirements for a driver, which 
may or may not use HAProxy, that we hope will satisfy all of our concerns. That 
said, we may also want to have some form of "Management API" to expose these 
features in a common way.

In this case, we really need to discuss two things:

  1.  Whether to update the existing "haproxy" driver to accommodate these 
Operator Requirements, or whether to start from scratch with a new driver 
(possibly both).
  2.  How to expose these Operator features at the (Management?) API level.

----------------------------------
---- 1) Driver
----------------------------------
I believe the current "haproxy" driver serves a very specific purpose, and 
while it will need some incremental updates, it would be in the best interest 
of the community to also create and maintain a new driver (which it sounds like 
several groups have already begun work on — ack!) that could support a 
different approach. For instance, the current "haproxy" driver is implemented 
by initializing HAProxy processes on a set of shared hosts, whereas there has 
been some momentum behind creating individual Virtual Machines (via Nova) for 
each Load Balancer created, similar to Libra's approach. Alternatively, we 
could use LXC or a similar technology to more effectively isolate LBs and 
assuage concerns about tenant cross-talk (real or imaginary, this has been an 
issue for some customers). Either way, we'd probably need a brand new driver, 
to avoid breaking backwards compatibility with the existing driver (which does 
work perfectly fine in many cases). In fact, it's possible that when we begin 
discussing this as a broader community, we might decide to create more than one 
additional driver (depending on which approaches people want to use and what 
features are most important to them). The only concern I have about that 
outcome is the necessary amount of code-reuse, and whether it would be possible 
to share certain aspects of these drivers without too much copy/pasting.

An example of one possible new driver could be the following (just off the top 
of my head):
* Use a pair of new Nova VMs for each LB (Scalability), configured to use a 
Shared IP (High Availability).
* Log to Swift / Ceilometer (Logging / Alerting / Metering).
* Provide calls that could be exposed via a Management API to show low level 
diagnostic details (Diagnostics).
* Provide calls that could be exposed via a Management API to allow 
syncing/reloading existing LBs or moving them across clusters (Recoverability, 
DDoS Mitigation).
This new driver would be named to reflect what features it provides, or at 
least given a unique name that can be referenced without confusion (something 
like "OpenHA" or "NovaHA" would work if that's not taken).

----------------------------------
---- 2) Management API
----------------------------------
Going forward, it should then be required (can we enforce this?) that any 
mainline driver include support for calls to handle these named Operator 
Requirements, for example: obtaining logs (or log locations?), diagnostic 
information, and admin type actions including rebuilding or migrating LB 
instances. So far we haven't really talked about any of these features in 
depth, though I believe the general need for a Management API was alluded to on 
several occasions. Should we shelve this discussion until after we have the 
User API specification locked down? Should we begin defining a contract for 
this Management API at the summit, since it would be the main gateway to the 
Operator Requirements that we have all been stressing recently?

----------------------------------
---- Summary
----------------------------------
I would apologize for not having much concrete specification here, but I think 
it is better to validate my basic assumptions first, before jumping deeper down 
this rabbit hole. The type of comments I'm hoping to prompt are along the lines 
of:
* "We should just focus on the existing haproxy driver."
* "We should definitely collaborate to make a new driver as a community."
* "I don't think a Management API is necessary."
* "This is definitely what I was thinking we'd need to do."
 Anything specific implementation details I've mentioned are intended be taken 
as one possible example, not as a well thought out proposal. I am, as one might 
say, "speaking my mind". My hope is that some of this will simmer on the 
general subconscious. I'd like to hear what the general consensus is on these 
topics, because these are some of the assumptions I've been operating under 
during the rest of our discussions, and if they're invalid, I may need to 
rebase my view on the API discussion as a whole.

Thanks ya'll, I'm looking forward to getting some additional viewpoints!
--Adam Harwell (rm_work)

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Stephen Balukoff
Blue Box Group, LLC
(800)613-4305 x807
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][LBaaS] Fulfilling Operator Requirements: Driver / Management API

Reply via email to