Re: [openstack-dev] [Nova][Neutron][Technical Committee] nova-network - Neutron. Throwing a wrench in the Neutron gap analysis

2014-08-06 Thread Ed Hall

On Aug 5, 2014, at 4:52 PM, Joshua Harlow harlo...@yahoo-inc.com wrote:

 I'm pretty sure yahoo is another case, with a large set of clusters on 
 nova-network still ;)
 
 I believe we have been active in these discussions, although I'm unsure what 
 was discussed at the meetup (being that I had planned vacation, right now 
 actually). 
 
 Anyways I think yahoo is fine with being a use case, but I can check when I 
 get back.
 

tl;dr: we’re willing to be a use case, but our internal timeline is such that 
in all likelihood
this will be as a post-mortem.

We (Yahoo) have thousands of pets that need migrated as well as an unspecified
number of cattle. A “live strategy is strongly preferred (I’m not saying 
“live migration
since in our case it needs to be an in-place operation, not shuffling instances 
around).
But several seconds of network outage? No problem. Disabling VM 
creation/deletion,
or even the entire Nova API for a few hours? Well take the grumbling from our 
internal
teams. A suspend/snapshot/cold-migrate would be an absolute last resort, and 
frankly
could push back our aggressive migration timeline significantly.

We’re interested in Oleg Bondarev’s solution, and I’ve even made some 
suggestions
in review comments as to how it can be made more “live, but it’s clear there 
are a
number of objections the greater Nova community has for it. Chief among these 
are the
addition of code and an API to Nova for what is essentially a one-shot 
operation, inability
to deal with more complicated configurations, and reliance on features only 
available in a
fresh release of libvirt. (As it turns out, only the latter affects us, but 
we’re a bit of an outlier
in the community.) It’s still under consideration for us, even if the community 
rejects the
approach.

As an alternative, we’re looking at DB-to-DB translation, with a one-shot 
script run on
the compute nodes to move network taps. We’d actually worked this out back in 
the
Quantum/Folsom era but backed off due to OVS/device driver issues (don’t ask -- 
I still
get nightmares). This, of course, would require an API outage, and is a big 
bang
approach (one of the attractions of Oleg’s approach is that we can migrate a 
few low-
value instances and then examine results carefully before proceeding). But once 
again,
our solution is likely to be of limited interest -- flat network without DHCP, 
no routers or
floating IPs, unconventional (for OpenStack) use of VLANs -- though we’d be 
happy
to share once the dust settles.

-Ed Hall
edh...@yahoo-inc.com

 
 On Aug 5, 2014, at 7:11 PM, Joe Gordon joe.gord...@gmail.com wrote:
 
 
 On Aug 5, 2014 12:57 PM, Jay Pipes jaypi...@gmail.com wrote:
 
  On 08/05/2014 03:23 PM, Collins, Sean wrote:
 
  On Tue, Aug 05, 2014 at 12:50:45PM EDT, Monty Taylor wrote:
 
  However, I think the cost to providing that path far outweighs
  the benefit in the face of other things on our plate.
 
 
  Perhaps those large operators that are hoping for a
  Nova-Network-Neutron zero-downtime live migration, could dedicate
  resources to this requirement? It is my direct experience that features
  that are important to a large organization will require resources
  from that very organization to be completed.
 
 
  Indeed, that's partly why I called out Metacloud in the original post, as 
  they were brought up as a deployer with this potential need. Please, if 
  there are any other shops that:
 Perhaps I am not remembering all the details discussed at the nova 
 mid-cycle, but Metacloud was brought up as an example company uses nova 
 network and not neutron, not as a company that needs live migration. And 
 that getting them to move to neutron would be a good litmus test for 
 nova-network performance parity, something that is very hard to do in the 
 gate.   But that was all said without any folks from Metacloud in the room, 
 so we may both be wrong.
 
 
  * Currently deploy nova-network
  * Need to move to Neutron
  * Their tenants cannot tolerate any downtime due to a cold migration
 
  Please do comment on this thread and speak up.
 
  Best,
  -jay
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][LBaaS] Fulfilling Operator Requirements: Driver / Management API

2014-05-02 Thread Ed Hall
Hi all,

At Yahoo, load balancing is heavily used throughout our stack for both HA and
load distribution, even within the OpenStack control plane itself. This 
involves a
variety of technologies, depending on scale and other requirements. For large
scale + L7 we use Apache Traffic Server, while L3DSR is the mainstay of the
highest bandwidth applications and a variety of technologies are used for simple
HA and lighter loads.

Each of these technologies has its own special operational requirements, and 
although
a single well-abstracted tenant-facing API to control all of them is much to be 
desired,
there can be no such luck for operators. A major concern for us is insuring 
that when a
tenant* has an operational issue they can communicate needs and concerns with
operators quickly and effectively. This means that any operator API must “speak 
the
same language” as the user API while exposing the necessary information and 
controls
for the underlying technology.

*In this case a “tenant” might represent a publicly-exposed URL with tens of 
millions of
users or an unexposed service which could impact several such web destinations.

  -Ed


On May 2, 2014, at 9:34 AM, Eichberger, German 
german.eichber...@hp.commailto:german.eichber...@hp.com wrote:

Hi Stephen + Adam,

Thanks Stephen and Adam for starting this discussion. I also see several 
different drivers. We at HP indeed use a pool of software load balancing 
appliances to replace any failing one. However, we are also interested in a 
model where we have load balancers in hot standby…

My hope with this effort is that we can somehow reuse the haproxy 
implementation and deploy it different ways depending on the necessary 
scalability, availability needs. Akin to creating a strategy which deploys the 
same haproxy control layer in a pool, on  nova vm, etc.

German


From: Stephen Balukoff [mailto:sbaluk...@bluebox.net]
Sent: Thursday, May 01, 2014 7:44 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Neutron][LBaaS] Fulfilling Operator Requirements: 
Driver / Management API

Hi Adam,

Thank you very much for starting this discussion!  In answer do your questions 
from my perspective:

1. I think that it makes sense to start at least one new driver that focuses on 
running software virtual appliances on Nova nodes (the NovaHA you referred to 
above). The existing haproxy driver should not go away as I think it solves 
problems for small to medium size deployments, and does well for setting up, 
for example, a 'development' or 'QA' load balancer that won't need to scale, 
but needs to duplicate much of the functionality of the production load 
balancer(s).

On this note, we may want to actually create several different drivers 
depending on the appliance model that operators are using. From the discussion 
about HA that I started a couple weeks ago, it sounds like HP is using an HA 
model that concentrates on pulling additional instances from a waiting pool. 
The stingray solution you're using sounds like raid 5 redundancy for load 
balancing. And what we've been using is more like raid 1 redundancy.

It probably makes sense to collaborate on a new driver and model if we agree on 
the topologies we want to support at our individual organizations. Even if we 
can't agree on this, it still makes sense for us to collaborate on determining 
that basic set of operator features that all drivers should support, from an 
operator perspective.

I think a management API is necessary--  operators and their support personnel 
need to be able to troubleshoot problems down to the device level, and I think 
it makes sense to do this through an OpenStack interface if possible. In order 
to accommodate each vendor's differences here, though, this may only be 
possible if we allow for different drivers to expose operator controls in 
their own way.

I do not think any of this should be exposed to the user API we have been 
discussing.

I think it's going to be important to come to some kind of agreement on the 
user API and object model changes before it's going to be possible to start to 
really talk about how to do the management API.

I am completely on board with this! As I have said in a couple other places on 
this list, Blue Box actually wrote our own software appliance based load 
balancing system based on HAProxy, stunnel, corosync/pacemaker, and a series of 
glue scripts (mostly written in perl, ruby, and shell) that provide a back-end 
API and whatnot. We've actually done this (almost) from scratch twice now, and 
have plans and some work underway to do it a third time-- this time to be 
compatible with OpenStack (and specifically the Neutron LBaaS API, hopefully as 
a driver for the same). This will be completely open source, and hopefully 
compliant with OpenStack standards (equivalent licensing, everything written in 
python, etc.)  So far, I've only had time to port over the back-end API and 

Re: [openstack-dev] [Neutron][LBaaS] Object Model discussion

2014-02-25 Thread Ed Hall

On Feb 25, 2014, at 10:10 AM, Stephen Balukoff 
sbaluk...@bluebox.netmailto:sbaluk...@bluebox.net wrote:
 On Feb 25, 2014 at 3:39 AM, 
enikano...@mirantis.commailto:enikano...@mirantis.com wrote:
Agree, however actual hardware is beyond logical LBaaS API but could be a part 
of admin LBaaS API.

Aah yes--  In my opinion, users should almost never be exposed to anything that 
represents a specific piece of hardware, but cloud administrators must be. The 
logical constructs the user is exposed to can come close to what an actual 
piece of hardware is, but again, we should be abstract enough that a cloud 
admin can swap out one piece of hardware for another without affecting the 
user's workflow, application configuration, (hopefully) availability, etc.

I recall you said previously that the concept of having an 'admin API' had been 
discussed earlier, but I forget the resolution behind this (if there was one). 
Maybe we should revisit this discussion?

I tend to think that if we acknowledge the need for an admin API, as well as 
some of the core features it's going to need, and contrast this with the user 
API (which I think is mostly what Jay and Mark McClain are rightly concerned 
about), it'll start to become obvious which features belong where, and what 
kind of data model will emerge which supports both APIs.

[I’m new to this discussion; my role at my employer has been shifted from an 
internal to a community focus and I’m madly
attempting to come up to speed. I’m a software developer with an operations 
focus; I’ve worked with OpenStack since Diablo
as Yahoo’s team lead for network integration.]

Two levels (user and admin) would be the minimum. But our experience over time 
is that even administrators occasionally
need to be saved from themselves. This suggests that, rather than two or more 
separate APIs, a single API with multiple
roles is needed. Certain operations and attributes would only be accessible to 
someone acting in an appropriate role.

This might seem over-elaborate at first glance, but there are other dividends: 
a single API is more likely to be consistent,
and maintained consistently as it evolves. By taking a role-wise view the 
hierarchy of concerns is clarified. If you focus on
the data model first you are more likely to produce an arrangement that mirrors 
the hardware but presents difficulties in
representing and implementing user and operator intent.

Just some general insights/opinions — take for what they’re worth.

 -Ed

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev