Re: [openstack-dev] [Nova][Neutron][Technical Committee] nova-network - Neutron. Throwing a wrench in the Neutron gap analysis
On Aug 5, 2014, at 4:52 PM, Joshua Harlow harlo...@yahoo-inc.com wrote: I'm pretty sure yahoo is another case, with a large set of clusters on nova-network still ;) I believe we have been active in these discussions, although I'm unsure what was discussed at the meetup (being that I had planned vacation, right now actually). Anyways I think yahoo is fine with being a use case, but I can check when I get back. tl;dr: we’re willing to be a use case, but our internal timeline is such that in all likelihood this will be as a post-mortem. We (Yahoo) have thousands of pets that need migrated as well as an unspecified number of cattle. A “live strategy is strongly preferred (I’m not saying “live migration since in our case it needs to be an in-place operation, not shuffling instances around). But several seconds of network outage? No problem. Disabling VM creation/deletion, or even the entire Nova API for a few hours? Well take the grumbling from our internal teams. A suspend/snapshot/cold-migrate would be an absolute last resort, and frankly could push back our aggressive migration timeline significantly. We’re interested in Oleg Bondarev’s solution, and I’ve even made some suggestions in review comments as to how it can be made more “live, but it’s clear there are a number of objections the greater Nova community has for it. Chief among these are the addition of code and an API to Nova for what is essentially a one-shot operation, inability to deal with more complicated configurations, and reliance on features only available in a fresh release of libvirt. (As it turns out, only the latter affects us, but we’re a bit of an outlier in the community.) It’s still under consideration for us, even if the community rejects the approach. As an alternative, we’re looking at DB-to-DB translation, with a one-shot script run on the compute nodes to move network taps. We’d actually worked this out back in the Quantum/Folsom era but backed off due to OVS/device driver issues (don’t ask -- I still get nightmares). This, of course, would require an API outage, and is a big bang approach (one of the attractions of Oleg’s approach is that we can migrate a few low- value instances and then examine results carefully before proceeding). But once again, our solution is likely to be of limited interest -- flat network without DHCP, no routers or floating IPs, unconventional (for OpenStack) use of VLANs -- though we’d be happy to share once the dust settles. -Ed Hall edh...@yahoo-inc.com On Aug 5, 2014, at 7:11 PM, Joe Gordon joe.gord...@gmail.com wrote: On Aug 5, 2014 12:57 PM, Jay Pipes jaypi...@gmail.com wrote: On 08/05/2014 03:23 PM, Collins, Sean wrote: On Tue, Aug 05, 2014 at 12:50:45PM EDT, Monty Taylor wrote: However, I think the cost to providing that path far outweighs the benefit in the face of other things on our plate. Perhaps those large operators that are hoping for a Nova-Network-Neutron zero-downtime live migration, could dedicate resources to this requirement? It is my direct experience that features that are important to a large organization will require resources from that very organization to be completed. Indeed, that's partly why I called out Metacloud in the original post, as they were brought up as a deployer with this potential need. Please, if there are any other shops that: Perhaps I am not remembering all the details discussed at the nova mid-cycle, but Metacloud was brought up as an example company uses nova network and not neutron, not as a company that needs live migration. And that getting them to move to neutron would be a good litmus test for nova-network performance parity, something that is very hard to do in the gate. But that was all said without any folks from Metacloud in the room, so we may both be wrong. * Currently deploy nova-network * Need to move to Neutron * Their tenants cannot tolerate any downtime due to a cold migration Please do comment on this thread and speak up. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Fulfilling Operator Requirements: Driver / Management API
Hi all, At Yahoo, load balancing is heavily used throughout our stack for both HA and load distribution, even within the OpenStack control plane itself. This involves a variety of technologies, depending on scale and other requirements. For large scale + L7 we use Apache Traffic Server, while L3DSR is the mainstay of the highest bandwidth applications and a variety of technologies are used for simple HA and lighter loads. Each of these technologies has its own special operational requirements, and although a single well-abstracted tenant-facing API to control all of them is much to be desired, there can be no such luck for operators. A major concern for us is insuring that when a tenant* has an operational issue they can communicate needs and concerns with operators quickly and effectively. This means that any operator API must “speak the same language” as the user API while exposing the necessary information and controls for the underlying technology. *In this case a “tenant” might represent a publicly-exposed URL with tens of millions of users or an unexposed service which could impact several such web destinations. -Ed On May 2, 2014, at 9:34 AM, Eichberger, German german.eichber...@hp.commailto:german.eichber...@hp.com wrote: Hi Stephen + Adam, Thanks Stephen and Adam for starting this discussion. I also see several different drivers. We at HP indeed use a pool of software load balancing appliances to replace any failing one. However, we are also interested in a model where we have load balancers in hot standby… My hope with this effort is that we can somehow reuse the haproxy implementation and deploy it different ways depending on the necessary scalability, availability needs. Akin to creating a strategy which deploys the same haproxy control layer in a pool, on nova vm, etc. German From: Stephen Balukoff [mailto:sbaluk...@bluebox.net] Sent: Thursday, May 01, 2014 7:44 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Neutron][LBaaS] Fulfilling Operator Requirements: Driver / Management API Hi Adam, Thank you very much for starting this discussion! In answer do your questions from my perspective: 1. I think that it makes sense to start at least one new driver that focuses on running software virtual appliances on Nova nodes (the NovaHA you referred to above). The existing haproxy driver should not go away as I think it solves problems for small to medium size deployments, and does well for setting up, for example, a 'development' or 'QA' load balancer that won't need to scale, but needs to duplicate much of the functionality of the production load balancer(s). On this note, we may want to actually create several different drivers depending on the appliance model that operators are using. From the discussion about HA that I started a couple weeks ago, it sounds like HP is using an HA model that concentrates on pulling additional instances from a waiting pool. The stingray solution you're using sounds like raid 5 redundancy for load balancing. And what we've been using is more like raid 1 redundancy. It probably makes sense to collaborate on a new driver and model if we agree on the topologies we want to support at our individual organizations. Even if we can't agree on this, it still makes sense for us to collaborate on determining that basic set of operator features that all drivers should support, from an operator perspective. I think a management API is necessary-- operators and their support personnel need to be able to troubleshoot problems down to the device level, and I think it makes sense to do this through an OpenStack interface if possible. In order to accommodate each vendor's differences here, though, this may only be possible if we allow for different drivers to expose operator controls in their own way. I do not think any of this should be exposed to the user API we have been discussing. I think it's going to be important to come to some kind of agreement on the user API and object model changes before it's going to be possible to start to really talk about how to do the management API. I am completely on board with this! As I have said in a couple other places on this list, Blue Box actually wrote our own software appliance based load balancing system based on HAProxy, stunnel, corosync/pacemaker, and a series of glue scripts (mostly written in perl, ruby, and shell) that provide a back-end API and whatnot. We've actually done this (almost) from scratch twice now, and have plans and some work underway to do it a third time-- this time to be compatible with OpenStack (and specifically the Neutron LBaaS API, hopefully as a driver for the same). This will be completely open source, and hopefully compliant with OpenStack standards (equivalent licensing, everything written in python, etc.) So far, I've only had time to port over the back-end API and
Re: [openstack-dev] [Neutron][LBaaS] Object Model discussion
On Feb 25, 2014, at 10:10 AM, Stephen Balukoff sbaluk...@bluebox.netmailto:sbaluk...@bluebox.net wrote: On Feb 25, 2014 at 3:39 AM, enikano...@mirantis.commailto:enikano...@mirantis.com wrote: Agree, however actual hardware is beyond logical LBaaS API but could be a part of admin LBaaS API. Aah yes-- In my opinion, users should almost never be exposed to anything that represents a specific piece of hardware, but cloud administrators must be. The logical constructs the user is exposed to can come close to what an actual piece of hardware is, but again, we should be abstract enough that a cloud admin can swap out one piece of hardware for another without affecting the user's workflow, application configuration, (hopefully) availability, etc. I recall you said previously that the concept of having an 'admin API' had been discussed earlier, but I forget the resolution behind this (if there was one). Maybe we should revisit this discussion? I tend to think that if we acknowledge the need for an admin API, as well as some of the core features it's going to need, and contrast this with the user API (which I think is mostly what Jay and Mark McClain are rightly concerned about), it'll start to become obvious which features belong where, and what kind of data model will emerge which supports both APIs. [I’m new to this discussion; my role at my employer has been shifted from an internal to a community focus and I’m madly attempting to come up to speed. I’m a software developer with an operations focus; I’ve worked with OpenStack since Diablo as Yahoo’s team lead for network integration.] Two levels (user and admin) would be the minimum. But our experience over time is that even administrators occasionally need to be saved from themselves. This suggests that, rather than two or more separate APIs, a single API with multiple roles is needed. Certain operations and attributes would only be accessible to someone acting in an appropriate role. This might seem over-elaborate at first glance, but there are other dividends: a single API is more likely to be consistent, and maintained consistently as it evolves. By taking a role-wise view the hierarchy of concerns is clarified. If you focus on the data model first you are more likely to produce an arrangement that mirrors the hardware but presents difficulties in representing and implementing user and operator intent. Just some general insights/opinions — take for what they’re worth. -Ed ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev