Re: [openstack-dev] [Neutron] Stepping down from core
Sad to see you go, Carl. Good luck! Eugene. On Thu, Nov 17, 2016 at 10:42 AM, Carl Baldwinwrote: > Neutron (and Openstack), > > It is with regret that I report that my work situation has changed such > that I'm not able to keep up with my duties as a Neutron core reviewer, L3 > lieutenant, and drivers team member. My participation has dropped off > considerably since Newton was released and I think it is fair to step down > and leave an opening for others to fill. There is no shortage of talent in > Neutron and Openstack and I know I'm leaving it in good hands. > > I will be more than happy to come back to full participation in Openstack > and Neutron in the future if things change again in that direction. This is > a great community and I've had a great time participating and learning with > you all. > > Well, I don't want to drag this out. I will still be around on IRC and > will be happy to help out where I am able. Feel free to ping me. > > Carl > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [horizon] Logo
Doge coin http://dogecoin.com/ On Sat, Jul 16, 2016 at 7:22 PM, Timur Sufievwrote: > Let's take it before anybody else did :)! > > On Sat, Jul 16, 2016 at 6:15 PM Richard Jones > wrote: > >> +2 very appropriate >> >> On 17 July 2016 at 11:01, Diana Whitten wrote: >> >>> Dunno if there have been any suggestions, but I'd like to suggest a >>> Shiba Inu for the Horizon logo mascot. >>> >>> If you unfamiliar with a Shiba Inu, take a look here: >>> http://vignette1.wikia.nocookie.net/sanicsource/images/9/97/Doge.jpg/revision/latest?cb=20160112233015 >>> >>> Our Shiba should definitely look shocked too. >>> >>> - Diana >>> >>> >>> __ >>> OpenStack Development Mailing List (not for usage questions) >>> Unsubscribe: >>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][ovs] The way we deal with MTU
That's interesting. In our deployments we do something like br-ex (linux bridge, mtu 9000) - OVSIntPort (mtu 65000) - br-floating (ovs bridge, mtu 1500) - br-int (ovs bridge, mtu 1500). qgs then are getting created in br-int, traffic goes all the way and that altogether allows jumbo frames over external network. For that reason I thought that mtu inside OVS doesn't really matter. This, however is for ovs 2.4.1 I wonder if that behavior has changed and if the description is available anywhere. Thanks, Eugene. On Mon, Jun 13, 2016 at 9:49 AM, Ihar Hrachyshkawrote: > Hi all, > > in Mitaka, we introduced a bunch of changes to the way we handle MTU in > Neutron/Nova, making sure that the whole instance data path, starting from > instance internal interface, thru hybrid bridge, into the br-int; as well > as router data path (qr) have proper MTU value set on all participating > devices. On hypervisor side, both Nova and Neutron take part in it, setting > it with ip-link tool based on what Neutron plugin calculates for us. So far > so good. > > Turns out that for OVS, it does not work as expected in regards to br-int. > There was a bug reported lately: https://launchpad.net/bugs/1590397 > > Briefly, when we try to set MTU on a device that is plugged into a bridge, > and if the bridge already has another port with lower MTU, the bridge > itself inherits MTU from that latter port, and Linux kernel (?) does not > allow to set MTU on the first device at all, making ip link calls > ineffective. > > AFAIU this behaviour is consistent with Linux bridging rules: you can’t > have ports of different MTU plugged into the same bridge. > > Now, that’s a huge problem for Neutron, because we plug ports that belong > to different networks (and that hence may have different MTUs) into the > same br-int bridge. > > So I played with the code locally a bit and spotted that currently, we set > MTU for router ports before we move their devices into router namespaces. > And once the device is in a namespace, ip-link actually works. So I wrote a > fix with a functional test that proves the point: > https://review.openstack.org/#/c/327651/ The fix was validated by the > reporter of the original bug and seems to fix the issue for him. > > It’s suspicious that it works from inside a namespace but not when the > device is still in the root namespace. So I reached out to Jiri Benc from > our local Open vSwitch team, and here is a quote: > > === > > "It's a bug in ovs-vswitchd. It doesn't see the interface that's in > other netns and thus cannot enforce the correct MTU. > > We'll hopefully fix it and disallow incorrect MTU setting even across > namespaces. However, it requires significant effort and rework of ovs > name space handling. > > You should not depend on the current buggy behavior. Don't set MTU of > the internal interfaces higher than the rest of the bridge, it's not > supported. Hacking this around by moving the interface to a netns is > exploiting of a bug. > > We can certainly discuss whether this limitation could be relaxed. > Honestly, I don't know, it's for a discussion upstream. But as of now, > it's not supported and you should not do it.” > > So basically, as long as we try to plug ports with different MTUs into the > same bridge, we are utilizing a bug in Open vSwitch, that may break us any > time. > > I guess our alternatives are: > - either redesign bridge setup for openvswitch to e.g. maintain a bridge > per network; > - or talk to ovs folks on whether they may support that for us. > > I understand the former option is too scary. It opens lots of questions, > including upgrade impact since it will obviously introduce a dataplane > downtime. That would be a huge shift in paradigm, probably too huge to > swallow. The latter option may not fly with vswitch folks. Any better ideas? > > It’s also not clear whether we want to proceed with my immediate fix. > Advices are welcome. > > Thanks, > Ihar > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] OVSDB native interface as default in gate jobs
May be you just don't have enough pain, Sean :) I'd agree that these should be coalesced, with a deprecation period then... E. On Mon, Apr 4, 2016 at 2:32 PM, Sean M. Collinswrote: > Inessa Vasilevskaya wrote: > > different configurations of of_interface and ovsdb_interface options > > (dsvm-fullstack [2] and rally tests are by now all I can think of). > > Wait, we have *two* different configuration options??? > > WHY WHY WHY > > -- > Sean M. Collins > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] FW: [vitrage] Gerrit Upgrade 12/16
I'm sorry to say that, but the new front page design is horrible and totally confusing. I hope it'll change soon in the new release. E. On Tue, Dec 15, 2015 at 10:53 AM, AFEK, Ifat (Ifat) < ifat.a...@alcatel-lucent.com> wrote: > Hi, > > Reminder: Gerrit upgrade is scheduled for tomorrow at 17:00 UTC. > > Ifat. > > > -Original Message- > From: Spencer Krum [mailto:n...@spencerkrum.com] > Sent: Monday, December 14, 2015 9:53 PM > To: openstack-dev@lists.openstack.org > Subject: Re: [openstack-dev] Gerrit Upgrade 12/16 > > This is a gentle reminder that the downtime will be this Wednesday > starting at 17:00 UTC. > > Thank you for your patience, > Spencer > > -- > Spencer Krum > n...@spencerkrum.com > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] neutron metadata-agent HA
It is as 'single' as active L3 router that is handling traffic at current point of time. On Sun, Dec 13, 2015 at 11:13 AM, Gary Kottonwrote: > > > > > > On 12/12/15, 10:44 PM, "Assaf Muller" wrote: > > >The neutron metadata agent is stateless. It takes requests from the > >metadata proxies running in the router namespaces and moves the > >requests on to the nova server. If you're using HA routers, start the > >neutron-metadata-agent on every machine the L3 agent runs, and just > >make sure that the metadata-agent is restarted in case it crashes and > >you're done. > > So does this mean that it could be the single point of failure? > > >Nothing else you need to do. > > > >On Fri, Dec 11, 2015 at 3:24 PM, Fabrizio Soppelsa > > wrote: > >> > >> On Dec 10, 2015, at 12:56 AM, Alvise Dorigo > >> wrote: > >> > >> So my question is: is there any progress on this topic ? is there a way > >> (something like a cronjob script) to make the metadata-agent redundant > >> without involving the clustering software Pacemaker/Corosync ? > >> > >> > >> Reason for such a dirty solution instead of rely onto pacemaker? > >> > >> I’m not aware of such initiatives - just checked the blueprints in > Neutron > >> and I found no relevant. I can suggest to file a proposal to the > >> correspondent launchpad page, by elaborating your idea. > >> > >> F. > >> > >> > __ > >> OpenStack Development Mailing List (not for usage questions) > >> Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > > > >__ > >OpenStack Development Mailing List (not for usage questions) > >Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [fuel] What to do when a controller runs out of space
On Tue, Oct 6, 2015 at 4:22 PM, Vladimir Kuklin <vkuk...@mirantis.com> wrote: > Eugene > > For example, each time that you need to have one instance (e.g. master > instance) of something non-stateless running in the cluster. > Right. This is theoretical. Practically, there are no such services among openstack. You are right that currently lots of things are fixed already - heat engine > is fine, for example. But I still see this issue with l3 agents and I will > not change my mind until we conduct complete scale and destructive testing > with new neutron code. > > Secondly, if we cannot reliably identify when to engage - then we need to > write the code that will tell us when to engage. If this code is already in > place and we can trigger a couple of commands to figure out Neutron agent > state, then we can add them to OCF script monitor and that is all. I agree > that we have some issues with our OCF scripts, for example some unoptimal > cleanup code that has issues with big scale, but I am almost sure we can > fix it. > > Finally, let me show an example of when you need a centralized cluster > manager to manage such situations - you have a temporary issue with > connectivity to neutron server over management network for some reason. > Your agents are not cleaned up and neutron server starts new l3 agent > instances on different node. In this case you will have IP duplication in > the network and will bring down the whole cluster as connectivity through > 'public' network will be working just fine. In case when we are using > Pacemaker - such node will be either fenced or will stop all the services > controlled by pacemaker as it is a part of non-quorate partition of the > cluster. When this happens, l3 agent OCF script will run its cleanup > section and purge all the stale IPs thus saving us from the trouble. I > obviously may be mistaking, so please correct me if this is not the case. > I think this deserves discussion in a separate thread, which I'll start soon. My initial point was (to state it clearly), that I will be -2 on any new additions of openstack services to pacemaker kingdom. Thanks, Eugene. > > > On Tue, Oct 6, 2015 at 3:46 PM, Eugene Nikanorov <enikano...@mirantis.com> > wrote: > >> >> >>> 2) I think you misunderstand what is the difference between >>> upstart/systemd and Pacemaker in this case. There are many cases when you >>> need to have syncrhonized view of the cluster. Otherwise you will hit >>> split-brain situations and have your cluster misfunctioning. Until >>> OpenStack provides us with such means there is no other way than using >>> Pacemaker/Zookeper/etc. >>> >> >> Could you please give some examples of those 'many cases' for openstack >> specifically? >> As for my 'misunderstanding' - openstack services only need to be always >> up, not more than that. >> Upstart does a perfect job there. >> >> >>> 3) Regarding Neutron agents - we discussed it many times - you need to >>> be able to control and clean up stuff after some service crashed. >>> Currently, Neutron does not provide reliable ways to do it. If your agent >>> dies and does not clean up ip addresses from the network namespace you will >>> get into the situation of ARP duplication which will be a kind of split >>> brain described in item #2. I personally as a system architect and >>> administrator do not believe for this to change in at least several years >>> for OpenStack so we will be using Pacemaker for a very long period of time. >>> >> >> This has been changed already, and a while ago. >> OCF infrastructure around neutron agents has never helped neutron in any >> meaningful way and is just an artifact from the dark past. >> The reasons are: pacemaker/ocf doesn't have enough intelligence to know >> when to engage, as a result, any cleanup could only be achieved through >> manual operations. I don't need to remind you how many bugs were in ocf >> scripts which brought whole clusters down after those manual operations. >> So it's just a way better to go with simple standard tools with >> fine-grain control. >> Same applies to any other openstack service (again, not rabbitmq/galera) >> >> > so we will be using Pacemaker for a very long period of time. >> Not for neutron, sorry. As soon as we finish the last bit of such >> cleanup, which is targeted for 8.0 >> >> Now, back to the topic - we may decide to use some more sophisticated >>> integral node health attribute which can be used with Pacemaker as well as >>> to put node into some kind of maintenance mode. We can lever
Re: [openstack-dev] [fuel] What to do when a controller runs out of space
> 2) I think you misunderstand what is the difference between > upstart/systemd and Pacemaker in this case. There are many cases when you > need to have syncrhonized view of the cluster. Otherwise you will hit > split-brain situations and have your cluster misfunctioning. Until > OpenStack provides us with such means there is no other way than using > Pacemaker/Zookeper/etc. > Could you please give some examples of those 'many cases' for openstack specifically? As for my 'misunderstanding' - openstack services only need to be always up, not more than that. Upstart does a perfect job there. > 3) Regarding Neutron agents - we discussed it many times - you need to be > able to control and clean up stuff after some service crashed. Currently, > Neutron does not provide reliable ways to do it. If your agent dies and > does not clean up ip addresses from the network namespace you will get into > the situation of ARP duplication which will be a kind of split brain > described in item #2. I personally as a system architect and administrator > do not believe for this to change in at least several years for OpenStack > so we will be using Pacemaker for a very long period of time. > This has been changed already, and a while ago. OCF infrastructure around neutron agents has never helped neutron in any meaningful way and is just an artifact from the dark past. The reasons are: pacemaker/ocf doesn't have enough intelligence to know when to engage, as a result, any cleanup could only be achieved through manual operations. I don't need to remind you how many bugs were in ocf scripts which brought whole clusters down after those manual operations. So it's just a way better to go with simple standard tools with fine-grain control. Same applies to any other openstack service (again, not rabbitmq/galera) > so we will be using Pacemaker for a very long period of time. Not for neutron, sorry. As soon as we finish the last bit of such cleanup, which is targeted for 8.0 Now, back to the topic - we may decide to use some more sophisticated > integral node health attribute which can be used with Pacemaker as well as > to put node into some kind of maintenance mode. We can leverage User > Maintenance Mode feature here or just simply stop particular services and > disable particular haproxy backends. > I think this kind of attribute, although being analyzed by pacemaker/ocf, doesn't need any new OS service to be put under pacemaker control. Thanks, Eugene. > > On Mon, Oct 5, 2015 at 11:57 PM, Eugene Nikanorov <enikano...@mirantis.com > > wrote: > >> >>>> >>> Mirantis does control neither Rabbitmq or Galera. Mirantis cannot assure >>> their quality as well. >>> >> >> Correct, and rabbitmq was always the pain in the back, preventing any *real >> *enterprise usage of openstack where reliability does matter. >> >> >>> > 2) it has terrible UX >>>> >>> >>> It looks like personal opinion. I'd like to see surveys or operators >>> feedbacks. Also, this statement is not constructive as it doesn't have >>> alternative solutions. >>> >> >> The solution is to get rid of terrible UX wherever possible (i'm not >> saying it is always possible, of course) >> upstart is just so much better. >> And yes, this is my personal opinion and is a summary of escalation >> team's experience. >> >> >>> >>>> > 3) it is not reliable >>>> >>> >>> I would say openstack services are not HA reliable. So OCF scripts are >>> reaction of operators on these problems. Many of them have child-ish issues >>> from release to release. Operators made OCF scripts to fix these problems. >>> A lot of openstack are stateful, so they require some kind of stickiness or >>> synchronization. Openstack services doesn't have simple health-check >>> functionality so it's hard to say it's running well or not. Sighup is still >>> a problem for many of openstack services. Etc/etc So, let's be constructive >>> here. >>> >> >> Well, I prefer to be responsible for what I know and maintain. Thus, I >> state that neutron doesn't need to be managed by pacemaker, neither server, >> nor all kinds of agents, and that's the path that neutron team will be >> taking. >> >> Thanks, >> Eugene. >> >>> >>> >>>> > >>>> >>>> I disagree with #1 as I do not agree that should be a criteria for an >>>> open-source project. Considering pacemaker is at the core of our >>>> controller setup, I would argue that if these are in fact true we need >>>> to be using som
Re: [openstack-dev] [fuel] What to do when a controller runs out of space
> > >> > Mirantis does control neither Rabbitmq or Galera. Mirantis cannot assure > their quality as well. > Correct, and rabbitmq was always the pain in the back, preventing any *real *enterprise usage of openstack where reliability does matter. > > 2) it has terrible UX >> > > It looks like personal opinion. I'd like to see surveys or operators > feedbacks. Also, this statement is not constructive as it doesn't have > alternative solutions. > The solution is to get rid of terrible UX wherever possible (i'm not saying it is always possible, of course) upstart is just so much better. And yes, this is my personal opinion and is a summary of escalation team's experience. > >> > 3) it is not reliable >> > > I would say openstack services are not HA reliable. So OCF scripts are > reaction of operators on these problems. Many of them have child-ish issues > from release to release. Operators made OCF scripts to fix these problems. > A lot of openstack are stateful, so they require some kind of stickiness or > synchronization. Openstack services doesn't have simple health-check > functionality so it's hard to say it's running well or not. Sighup is still > a problem for many of openstack services. Etc/etc So, let's be constructive > here. > Well, I prefer to be responsible for what I know and maintain. Thus, I state that neutron doesn't need to be managed by pacemaker, neither server, nor all kinds of agents, and that's the path that neutron team will be taking. Thanks, Eugene. > > >> > >> >> I disagree with #1 as I do not agree that should be a criteria for an >> open-source project. Considering pacemaker is at the core of our >> controller setup, I would argue that if these are in fact true we need >> to be using something else. I would agree that it is a terrible UX >> but all the clustering software I've used fall in this category. I'd >> like more information on how it is not reliable. Do we have numbers to >> backup these claims? >> >> > (3) is not evaluation of the project itself, but just a logical >> consequence >> > of (1) and (2). >> > As a part of escalation team I can say that it has cost our team >> thousands >> > of man hours of head-scratching, staring at pacemaker logs which value >> are >> > usually slightly below zero. >> > >> > Most of openstack services (in fact, ALL api servers) are stateless, >> they >> > don't require any cluster management (also, they don't need to be moved >> in >> > case of lack of space). >> > Statefull services like neutron agents have their states being a >> function of >> > db state and are able to syncronize it with the server without external >> > "help". >> > >> >> So it's not an issue with moving services so much as being able to >> stop the services when a condition is met. Have we tested all OS >> services to ensure they do function 100% when out of disk space? I >> would assume that glance might have issues with image uploads if there >> is no space to handle a request. >> >> > So now usage of pacemaker can be only justified for cases where >> service's >> > clustering mechanism requires active monitoring (rabbitmq, galera) >> > But even there, examples when we are better off without pacemaker are >> all >> > around. >> > >> > Thanks, >> > Eugene. >> > >> >> After I sent this email, I had further discussions around the issues >> that I'm facing and it may not be completely related to disk space. I >> think we might be relying on the expectation that the local rabbitmq >> is always available but I need to look into that. Either way, I >> believe we still should continue to discuss this issue as we are >> managing services in multiple ways on a single host. Additionally I do >> not believe that we really perform quality health checks on our >> services. >> >> Thanks, >> -Alex >> >> >> > >> > On Mon, Oct 5, 2015 at 1:34 PM, Sergey Vasilenko < >> svasile...@mirantis.com> >> > wrote: >> >> >> >> >> >> On Mon, Oct 5, 2015 at 12:22 PM, Eugene Nikanorov >> >> <enikano...@mirantis.com> wrote: >> >>> >> >>> No pacemaker for os services, please. >> >>> We'll be moving out neutron agents from pacemaker control in 8.0, >> other >> >>> os services don't need it too. >> >&g
Re: [openstack-dev] [fuel] What to do when a controller runs out of space
No pacemaker for os services, please. We'll be moving out neutron agents from pacemaker control in 8.0, other os services don't need it too. E. 5 окт. 2015 г. 12:01 пользователь "Sergii Golovatiuk" < sgolovat...@mirantis.com> написал: > Good morning gentlemen! > > Alex raised very good question. Thank you very much! We have 3 init > systems right now. Some services use SystemV, some services use upstart, > some services are under pacemaker. Personally, I would like to have > pacemaker as pid 1 to replace init [1]. However, I would like to remove > custom scripts as much as possible to leave only upstart/systemd classes > [2] only. That move will give fantastic flexibility to operators to control > their services. > > Concerning Haproxy checker, I think it should be done in different way. If > pacemaker/corosyunc has an issue the node should be fenced. > > Also, I would like to have pacemaker remote to control services on compute > nodes. It's very good replacement for monit. > > [1] https://www.youtube.com/watch?v=yq5nYPKxBCo > [2] > http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-supported.html > > > > -- > Best regards, > Sergii Golovatiuk, > Skype #golserge > IRC #holser > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [fuel] What to do when a controller runs out of space
Ok, Project-wise: 1) Pacemaker is not under our company's control, we can't assure its quality 2) it has terrible UX 3) it is not reliable (3) is not evaluation of the project itself, but just a logical consequence of (1) and (2). As a part of escalation team I can say that it has cost our team thousands of man hours of head-scratching, staring at pacemaker logs which value are usually slightly below zero. Most of openstack services (in fact, ALL api servers) are stateless, they don't require any cluster management (also, they don't need to be moved in case of lack of space). Statefull services like neutron agents have their states being a function of db state and are able to syncronize it with the server without external "help". So now usage of pacemaker can be only justified for cases where service's clustering mechanism requires active monitoring (rabbitmq, galera) But even there, examples when we are better off without pacemaker are all around. Thanks, Eugene. On Mon, Oct 5, 2015 at 1:34 PM, Sergey Vasilenko <svasile...@mirantis.com> wrote: > > On Mon, Oct 5, 2015 at 12:22 PM, Eugene Nikanorov <enikano...@mirantis.com > > wrote: > >> No pacemaker for os services, please. >> We'll be moving out neutron agents from pacemaker control in 8.0, other >> os services don't need it too. >> > > could you please provide your arguments. > > > /sv > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron] Issue with pymysql
Hi neutrons, I'd like to draw your attention to an issue discovered by rally gate job: http://logs.openstack.org/96/190796/4/check/gate-rally-dsvm-neutron-rally/7a18e43/logs/screen-q-svc.txt.gz?level=TRACE I don't have bandwidth to take a deep look at it, but first impression is that it is some issue with nested transaction support either on sqlalchemy or pymysql side. Also, besides errors with nested transactions, there are a lot of Lock wait timeouts. I think it makes sense to start with reverting the patch that moves to pymysql. Thanks, Eugene. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][python3] use of six.iteritems()
Huge +1 both for the suggestion and for reasoning. It's better to avoid substituting language features by a library. Eugene. On Tue, Jun 9, 2015 at 5:15 PM, Robert Collins robe...@robertcollins.net wrote: I'm very glad folk are working on Python3 ports. I'd like to call attention to one little wart in that process: I get the feeling that folk are applying a massive regex to find things like d.iteritems() and convert that to six.iteritems(d). I'd very much prefer that such a regex approach move things to d.items(), which is much easier to read. Here's why. Firstly, very very very few of our dict iterations are going to be performance sensitive in the way that iteritems() matters. Secondly, no really - unless you're doing HUGE dicts, it doesn't matter. Thirdly. Really, it doesn't. At 1 million items the overhead is 54ms[1]. If we're doing inner loops on million item dictionaries anywhere in OpenStack today, we have a problem. We might want to in e.g. the scheduler... if it held in-memory state on a million hypervisors at once, because I don't really to to imagine it pulling a million rows from a DB on every action. But then, we'd be looking at a whole 54ms. I think we could survive, if we did that (which we don't). So - please, no six.iteritems(). Thanks, Rob [1] python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 76.6 msec per loop python2.7 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.iteritems(): pass' 100 loops, best of 3: 22.6 msec per loop python3.4 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 18.9 msec per loop pypy2.3 -m timeit -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 10 loops, best of 3: 65.8 msec per loop # and out of interest, assuming that that hadn't triggered the JIT but it had. pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(100)))' 'for i in d.items(): pass' 1000 loops, best of 3: 64.3 msec per loop -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] L3 agent rescheduling issue
Yes, 50-100 networks received by DHCP agent on startup could cause 2nd state report to be sent seconds after it should be sent. In my tests, if I recall correctly, it was ~70 networks and delay between 1st and 2nd state report around 25 seconds (while 5 sec was configured) Eugene. On Sun, Jun 7, 2015 at 11:11 PM, Kevin Benton blak...@gmail.com wrote: Well a greenthread will only yield when it makes a blocking call like writing to a network socket, file, etc. So once the report_state greenthread starts executing, it won't yield until it makes a call like that. I looked through the report_state code for the DHCP agent and the only blocking call it seems to make is the AMQP report_state call/cast itself. So even with a bunch of other workers, the report_state thread should get execution fairly quickly since most of our workers should yield very frequently when they make process calls, etc. That's why I assumed that there must be something actually stopping it from sending the message. Do you have a way to reproduce the issue with the DHCP agent? On Sun, Jun 7, 2015 at 9:21 PM, Eugene Nikanorov enikano...@mirantis.com wrote: No, I think greenthread itself don't do anything special, it's just when there are too many threads, state_report thread can't get the control for too long, since there is no prioritization of greenthreads. Eugene. On Sun, Jun 7, 2015 at 8:24 PM, Kevin Benton blak...@gmail.com wrote: I understand now. So the issue is that the report_state greenthread is just blocking and yielding whenever it tries to actually send a message? On Sun, Jun 7, 2015 at 8:10 PM, Eugene Nikanorov enikano...@mirantis.com wrote: Salvatore, By 'fairness' I meant chances for state report greenthread to get the control. In DHCP case, each network processed by a separate greenthread, so the more greenthreads agent has, the less chances that report state greenthread will be able to report in time. Thanks, Eugene. On Sun, Jun 7, 2015 at 4:15 AM, Salvatore Orlando sorla...@nicira.com wrote: On 5 June 2015 at 01:29, Itsuro ODA o...@valinux.co.jp wrote: Hi, After trying to reproduce this, I'm suspecting that the issue is actually on the server side from failing to drain the agent report state queue in time. I have seen before. I thought the senario at that time as follows. * a lot of create/update resource API issued * rpc_conn_pool_size pool exhausted for sending notify and blocked farther sending side of RPC. * rpc_thread_pool_size pool exhausted by waiting rpc_conn_pool_size pool for replying RPC. * receiving state_report is blocked because rpc_thread_pool_size pool exhausted. I think this could be a good explanation couldn't it? Kevin proved that the periodic tasks are not mutually exclusive and that long process times for sync_routers are not an issue. However, he correctly suspected a server-side involvement, which could actually be a lot of requests saturating the RPC pool. On the other hand, how could we use this theory to explain why this issue tend to occur when the agent is restarted? Also, Eugene, what do you mean by stating that the issue could be in agent's fairness? Salvatore Thanks Itsuro Oda On Thu, 4 Jun 2015 14:20:33 -0700 Kevin Benton blak...@gmail.com wrote: After trying to reproduce this, I'm suspecting that the issue is actually on the server side from failing to drain the agent report state queue in time. I set the report_interval to 1 second on the agent and added a logging statement and I see a report every 1 second even when sync_routers is taking a really long time. On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin c...@ecbaldwin.net wrote: Ann, Thanks for bringing this up. It has been on the shelf for a while now. Carl On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando sorla...@nicira.com wrote: One reason for not sending the heartbeat from a separate greenthread could be that the agent is already doing it [1]. The current proposed patch addresses the issue blindly - that is to say before declaring an agent dead let's wait for some more time because it could be stuck doing stuff. In that case I would probably make the multiplier (currently 2x) configurable. The reason for which state report does not occur is probably that both it and the resync procedure are periodic tasks. If I got it right they're both executed as eventlet greenthreads but one at a time. Perhaps then adding an initial delay to the full sync task might ensure the first thing an agent does when it comes up is sending a heartbeat to the server? On the other hand, while doing the initial full resync, is the agent able to process updates? If not perhaps it makes sense to have it down until it finishes synchronisation. Yes, it can! The agent prioritizes updates from RPC over full resync
Re: [openstack-dev] [Neutron] L3 agent rescheduling issue
Salvatore, By 'fairness' I meant chances for state report greenthread to get the control. In DHCP case, each network processed by a separate greenthread, so the more greenthreads agent has, the less chances that report state greenthread will be able to report in time. Thanks, Eugene. On Sun, Jun 7, 2015 at 4:15 AM, Salvatore Orlando sorla...@nicira.com wrote: On 5 June 2015 at 01:29, Itsuro ODA o...@valinux.co.jp wrote: Hi, After trying to reproduce this, I'm suspecting that the issue is actually on the server side from failing to drain the agent report state queue in time. I have seen before. I thought the senario at that time as follows. * a lot of create/update resource API issued * rpc_conn_pool_size pool exhausted for sending notify and blocked farther sending side of RPC. * rpc_thread_pool_size pool exhausted by waiting rpc_conn_pool_size pool for replying RPC. * receiving state_report is blocked because rpc_thread_pool_size pool exhausted. I think this could be a good explanation couldn't it? Kevin proved that the periodic tasks are not mutually exclusive and that long process times for sync_routers are not an issue. However, he correctly suspected a server-side involvement, which could actually be a lot of requests saturating the RPC pool. On the other hand, how could we use this theory to explain why this issue tend to occur when the agent is restarted? Also, Eugene, what do you mean by stating that the issue could be in agent's fairness? Salvatore Thanks Itsuro Oda On Thu, 4 Jun 2015 14:20:33 -0700 Kevin Benton blak...@gmail.com wrote: After trying to reproduce this, I'm suspecting that the issue is actually on the server side from failing to drain the agent report state queue in time. I set the report_interval to 1 second on the agent and added a logging statement and I see a report every 1 second even when sync_routers is taking a really long time. On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin c...@ecbaldwin.net wrote: Ann, Thanks for bringing this up. It has been on the shelf for a while now. Carl On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando sorla...@nicira.com wrote: One reason for not sending the heartbeat from a separate greenthread could be that the agent is already doing it [1]. The current proposed patch addresses the issue blindly - that is to say before declaring an agent dead let's wait for some more time because it could be stuck doing stuff. In that case I would probably make the multiplier (currently 2x) configurable. The reason for which state report does not occur is probably that both it and the resync procedure are periodic tasks. If I got it right they're both executed as eventlet greenthreads but one at a time. Perhaps then adding an initial delay to the full sync task might ensure the first thing an agent does when it comes up is sending a heartbeat to the server? On the other hand, while doing the initial full resync, is the agent able to process updates? If not perhaps it makes sense to have it down until it finishes synchronisation. Yes, it can! The agent prioritizes updates from RPC over full resync activities. I wonder if the agent should check how long it has been since its last state report each time it finishes processing an update for a router. It normally doesn't take very long (relatively) to process an update to a single router. I still would like to know why the thread to report state is being starved. Anyone have any insight on this? I thought that with all the system calls, the greenthreads would yield often. There must be something I don't understand about it. Carl __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton -- Itsuro ODA o...@valinux.co.jp __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
Re: [openstack-dev] [Neutron] L3 agent rescheduling issue
No, I think greenthread itself don't do anything special, it's just when there are too many threads, state_report thread can't get the control for too long, since there is no prioritization of greenthreads. Eugene. On Sun, Jun 7, 2015 at 8:24 PM, Kevin Benton blak...@gmail.com wrote: I understand now. So the issue is that the report_state greenthread is just blocking and yielding whenever it tries to actually send a message? On Sun, Jun 7, 2015 at 8:10 PM, Eugene Nikanorov enikano...@mirantis.com wrote: Salvatore, By 'fairness' I meant chances for state report greenthread to get the control. In DHCP case, each network processed by a separate greenthread, so the more greenthreads agent has, the less chances that report state greenthread will be able to report in time. Thanks, Eugene. On Sun, Jun 7, 2015 at 4:15 AM, Salvatore Orlando sorla...@nicira.com wrote: On 5 June 2015 at 01:29, Itsuro ODA o...@valinux.co.jp wrote: Hi, After trying to reproduce this, I'm suspecting that the issue is actually on the server side from failing to drain the agent report state queue in time. I have seen before. I thought the senario at that time as follows. * a lot of create/update resource API issued * rpc_conn_pool_size pool exhausted for sending notify and blocked farther sending side of RPC. * rpc_thread_pool_size pool exhausted by waiting rpc_conn_pool_size pool for replying RPC. * receiving state_report is blocked because rpc_thread_pool_size pool exhausted. I think this could be a good explanation couldn't it? Kevin proved that the periodic tasks are not mutually exclusive and that long process times for sync_routers are not an issue. However, he correctly suspected a server-side involvement, which could actually be a lot of requests saturating the RPC pool. On the other hand, how could we use this theory to explain why this issue tend to occur when the agent is restarted? Also, Eugene, what do you mean by stating that the issue could be in agent's fairness? Salvatore Thanks Itsuro Oda On Thu, 4 Jun 2015 14:20:33 -0700 Kevin Benton blak...@gmail.com wrote: After trying to reproduce this, I'm suspecting that the issue is actually on the server side from failing to drain the agent report state queue in time. I set the report_interval to 1 second on the agent and added a logging statement and I see a report every 1 second even when sync_routers is taking a really long time. On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin c...@ecbaldwin.net wrote: Ann, Thanks for bringing this up. It has been on the shelf for a while now. Carl On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando sorla...@nicira.com wrote: One reason for not sending the heartbeat from a separate greenthread could be that the agent is already doing it [1]. The current proposed patch addresses the issue blindly - that is to say before declaring an agent dead let's wait for some more time because it could be stuck doing stuff. In that case I would probably make the multiplier (currently 2x) configurable. The reason for which state report does not occur is probably that both it and the resync procedure are periodic tasks. If I got it right they're both executed as eventlet greenthreads but one at a time. Perhaps then adding an initial delay to the full sync task might ensure the first thing an agent does when it comes up is sending a heartbeat to the server? On the other hand, while doing the initial full resync, is the agent able to process updates? If not perhaps it makes sense to have it down until it finishes synchronisation. Yes, it can! The agent prioritizes updates from RPC over full resync activities. I wonder if the agent should check how long it has been since its last state report each time it finishes processing an update for a router. It normally doesn't take very long (relatively) to process an update to a single router. I still would like to know why the thread to report state is being starved. Anyone have any insight on this? I thought that with all the system calls, the greenthreads would yield often. There must be something I don't understand about it. Carl __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton -- Itsuro ODA o...@valinux.co.jp __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [openstack-dev] [Neutron] L3 agent rescheduling issue
I doubt it's a server side issue. Usually there are plenty of rpc workers to drain much higher amount of rpc messages going from agents. So the issue could be in 'fairness' on L3 agent side. But from my observations it was more an issue of DHCP agent than L3 agent due to difference in resource processing. Thanks, Eugene. On Thu, Jun 4, 2015 at 4:29 PM, Itsuro ODA o...@valinux.co.jp wrote: Hi, After trying to reproduce this, I'm suspecting that the issue is actually on the server side from failing to drain the agent report state queue in time. I have seen before. I thought the senario at that time as follows. * a lot of create/update resource API issued * rpc_conn_pool_size pool exhausted for sending notify and blocked farther sending side of RPC. * rpc_thread_pool_size pool exhausted by waiting rpc_conn_pool_size pool for replying RPC. * receiving state_report is blocked because rpc_thread_pool_size pool exhausted. Thanks Itsuro Oda On Thu, 4 Jun 2015 14:20:33 -0700 Kevin Benton blak...@gmail.com wrote: After trying to reproduce this, I'm suspecting that the issue is actually on the server side from failing to drain the agent report state queue in time. I set the report_interval to 1 second on the agent and added a logging statement and I see a report every 1 second even when sync_routers is taking a really long time. On Thu, Jun 4, 2015 at 11:52 AM, Carl Baldwin c...@ecbaldwin.net wrote: Ann, Thanks for bringing this up. It has been on the shelf for a while now. Carl On Thu, Jun 4, 2015 at 8:54 AM, Salvatore Orlando sorla...@nicira.com wrote: One reason for not sending the heartbeat from a separate greenthread could be that the agent is already doing it [1]. The current proposed patch addresses the issue blindly - that is to say before declaring an agent dead let's wait for some more time because it could be stuck doing stuff. In that case I would probably make the multiplier (currently 2x) configurable. The reason for which state report does not occur is probably that both it and the resync procedure are periodic tasks. If I got it right they're both executed as eventlet greenthreads but one at a time. Perhaps then adding an initial delay to the full sync task might ensure the first thing an agent does when it comes up is sending a heartbeat to the server? On the other hand, while doing the initial full resync, is the agent able to process updates? If not perhaps it makes sense to have it down until it finishes synchronisation. Yes, it can! The agent prioritizes updates from RPC over full resync activities. I wonder if the agent should check how long it has been since its last state report each time it finishes processing an update for a router. It normally doesn't take very long (relatively) to process an update to a single router. I still would like to know why the thread to report state is being starved. Anyone have any insight on this? I thought that with all the system calls, the greenthreads would yield often. There must be something I don't understand about it. Carl __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton -- Itsuro ODA o...@valinux.co.jp __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
All production Openstack applications today are fully serialized to only be able to emit a single query to the database at a time; True. That's why any deployment configures tons (tens) of workers of any significant service. When I talk about moving to threads, this is not a won't help or hurt kind of issue, at the moment it's a change that will immediately allow massive improvement to the performance of all Openstack applications instantly. Not sure If it will give much benefit over separate processes. I guess we don't configure many worker for gate testing (at least, neutron still doesn't do it), so there could be an improvement, but I guess to enable multithreading we would need to fix the same issues that prevented us from configuring multiple workers in the gate, plus possibly more. We need to change the DB library or dump eventlet. I'm +1 for the 1st option. Other option, which is multithreading will most certainly bring concurrency issues other than database. Thanks, Eugene. On Mon, May 11, 2015 at 4:46 PM, Boris Pavlovic bo...@pavlovic.me wrote: Mike, Thank you for saying all that you said above. Best regards, Boris Pavlovic On Tue, May 12, 2015 at 2:35 AM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Mike Bayer's message of 2015-05-11 15:44:30 -0700: On 5/11/15 5:25 PM, Robert Collins wrote: Details: Skip over this bit if you know it all already. The GIL plays a big factor here: if you want to scale the amount of CPU available to a Python service, you have two routes: A) move work to a different process through some RPC - be that DB's using SQL, other services using oslo.messaging or HTTP - whatever. B) use C extensions to perform work in threads - e.g. openssl context processing. To increase concurrency you can use threads, eventlet, asyncio, twisted etc - because within a single process *all* Python bytecode execution happens inside the GIL lock, so you get at most one CPU for a CPU bound workload. For an IO bound workload, you can fit more work in by context switching within that one CPU capacity. And - the GIL is a poor scheduler, so at the limit - an IO bound workload where the IO backend has more capacity than we have CPU to consume it within our process, you will run into priority inversion and other problems. [This varies by Python release too]. request_duration = time_in_cpu + time_blocked request_cpu_utilisation = time_in_cpu/request_duration cpu_utilisation = concurrency * request_cpu_utilisation Assuming that we don't want any one process to spend a lot of time at 100% - to avoid such at-the-limit issues, lets pick say 80% utilisation, or a safety factor of 0.2. If a single request consumes 50% of its duration waiting on IO, and 50% of its duration executing bytecode, we can only run one such request concurrently without hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends 75% of its duration waiting on IO and 25% on CPU, we can run 3 such requests concurrently without exceeding our target of 80% utilisation: (3*0.25=0.75). What we have today in our standard architecture for OpenStack is optimised for IO bound workloads: waiting on the network/subprocesses/disk/libvirt etc. Running high numbers of eventlet handlers in a single process only works when the majority of the work being done by a handler is IO. Everything stated here is great, however in our situation there is one unfortunate fact which renders it completely incorrect at the moment. I'm still puzzled why we are getting into deep think sessions about the vagaries of the GIL and async when there is essentially a full-on red-alert performance blocker rendering all of this discussion useless, so I must again remind us: what we have *today* in Openstack is *as completely un-optimized as you can possibly be*. The most GIL-heavy nightmare CPU bound task you can imagine running on 25 threads on a ten year old Pentium will run better than the Openstack we have today, because we are running a C-based, non-eventlet patched DB library within a single OS thread that happens to use eventlet, but the use of eventlet is totally pointless because right now it blocks completely on all database IO. All production Openstack applications today are fully serialized to only be able to emit a single query to the database at a time; for each message sent, the entire application blocks an order of magnitude more than it would under the GIL waiting for the database library to send a message to MySQL, waiting for MySQL to send a response including the full results, waiting for the database to unwrap the response into Python structures, and finally back to the Python space, where we can send another database message and block the entire application and all greenlets while this single message proceeds. To share a link I've already shared
[openstack-dev] [Neutron] A10 CI
Hi folks, who is responsible for A10 CI? It keeps spamming with CI results on many patches that are not being updated. Also it spins sometimes, producing tens and hundreds of emails for particular patch. Please repair! Thanks, Eugene. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all][oslo.db] Repeatable Read considered harmful
Hi Matthew, I'll add just 2c: We've tried to move from repeatable-read to read committed in Neutron project. This change actually has caused multiple deadlocks during regular tempest test run. That is a known problem (the issue with eventlet and currect mysql client library), but anyway, at least one major openstack project is not ready to move to read-committed. Also, particular transaction isolation level's performance is highly affected by DB usage pattern. Is there any research of how read-committed affects performance of openstack projects? Thanks, Eugene. On Fri, Feb 6, 2015 at 7:59 PM, Matthew Booth mbo...@redhat.com wrote: I was surprised recently to discover that MySQL uses repeatable read for transactions by default. Postgres uses read committed by default, and SQLite uses serializable. We don't set the isolation level explicitly anywhere, so our applications are running under different isolation levels depending on backend. This doesn't sound like a good idea to me. It's one thing to support multiple sql syntaxes, but different isolation levels have different semantics. Supporting that is much harder, and currently we're not even trying. I'm aware that the same isolation level on different databases will still have subtly different semantics, but at least they should agree on the big things. I think we should pick one, and it should be read committed. Also note that 'repeatable read' on both MySQL and Postgres is actually snapshot isolation, which isn't quite the same thing. For example, it doesn't get phantom reads. The most important reason I think we need read committed is recovery from concurrent changes within the scope of a single transaction. To date, in Nova at least, this hasn't been an issue as transactions have had an extremely small scope. However, we're trying to expand that scope with the new enginefacade in oslo.db: https://review.openstack.org/#/c/138215/ . With this expanded scope, transaction failure in a library function can't simply be replayed because the transaction scope is larger than the function. So, 3 concrete examples of how repeatable read will make Nova worse: * https://review.openstack.org/#/c/140622/ This was committed to Nova recently. Note how it involves a retry in the case of concurrent change. This works fine, because the retry is creates a new transaction. However, if the transaction was larger than the scope of this function this would not work, because each iteration would continue to read the old data. The solution to this is to create a new transaction. However, because the transaction is outside of the scope of this function, the only thing we can do locally is fail. The caller then has to re-execute the whole transaction, or fail itself. This is a local concurrency problem which can be very easily handled locally, but not if we're using repeatable read. * https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L4749 Nova has multiple functions of this type which attempt to update a key/value metadata table. I'd expect to find multiple concurrency issues with this if I stopped to give it enough thought, but concentrating just on what's there, notice how the retry loop starts a new transaction. If we want to get to a place where we don't do that, with repeatable read we're left failing the whole transaction. * https://review.openstack.org/#/c/136409/ This one isn't upstream, yet. It's broken, and I can't currently think of a solution if we're using repeatable read. The issue is atomic creation of a shared resource. We want to handle a creation race safely. This patch: * Attempts to reads the default (it will normally exist) * Creates a new one if it doesn't exist * Goes back to the start if creation failed due to a duplicate Seems fine, but it will fail because the re-read will continue to not return the new value under repeatable read (no phantom reads). The only way to see the new row is a new transaction. Is this will no longer be in the scope of this function, the only solution will be to fail. Read committed could continue without failing. Incidentally, this currently works by using multiple transactions, which we are trying to avoid. It has also been suggested that in this specific instance the default security group could be created with the project. However, that would both be more complicated, because it would require putting a hook into another piece of code, and less robust, because it wouldn't recover if somebody deleted the default security group. To summarise, with repeatable read we're forced to abort the current transaction to deal with certain relatively common classes of concurrency issue, whereas with read committed we can safely recover. If we want to reduce the number of transactions we're using, which we do, the impact of this is going to dramatically increase. We should standardise on read committed. Matt -- Matthew
Re: [openstack-dev] [Neutron] db-level locks, non-blocking algorithms, active/active DB clusters and IPAM
Thanks for putting this all together, Salvatore. I just want to comment on this suggestion: 1) Move the allocation logic out of the driver, thus making IPAM an independent service. The API workers will then communicate with the IPAM service through a message bus, where IP allocation requests will be naturally serialized Right now port creation is already a distributed process involving several parties. Adding one more actor outside Neutron which can be communicated with message bus just to serialize requests makes me think of how terrible troubleshooting could be in case of applied load, when communication over mq slows down or interrupts. Not to say such service would be SPoF and a contention point. So, this of course could be an option, but personally I'd not like to see it as a default. Thanks, Eugene. On Wed, Feb 25, 2015 at 4:35 AM, Robert Collins robe...@robertcollins.net wrote: On 24 February 2015 at 01:07, Salvatore Orlando sorla...@nicira.com wrote: Lazy-Stacker summary: ... In the medium term, there are a few things we might consider for Neutron's built-in IPAM. 1) Move the allocation logic out of the driver, thus making IPAM an independent service. The API workers will then communicate with the IPAM service through a message bus, where IP allocation requests will be naturally serialized 2) Use 3-party software as dogpile, zookeeper but even memcached to implement distributed coordination. I have nothing against it, and I reckon Neutron can only benefit for it (in case you're considering of arguing that it does not scale, please also provide solid arguments to support your claim!). Nevertheless, I do believe API request processing should proceed undisturbed as much as possible. If processing an API requests requires distributed coordination among several components then it probably means that an asynchronous paradigm is more suitable for that API request. So data is great. It sounds like as long as we have an appropriate retry decorator in place, that write locks are better here, at least for up to 30 threads. But can we trust the data? One thing I'm not clear on is the SQL statement count. You say 100 queries for A-1 with a time on Galera of 0.06*1.2=0.072 seconds per allocation ? So is that 2 queries over 50 allocations over 20 threads? I'm not clear on what the request parameter in the test json files does, and AFAICT your threads each do one request each. As such I suspect that you may be seeing less concurrency - and thus contention - than real-world setups where APIs are deployed to run worker processes in separate processes and requests are coming in willy-nilly. The size of each algorithms workload is so small that its feasible to imagine the thread completing before the GIL bytecount code trigger (see https://docs.python.org/2/library/sys.html#sys.setcheckinterval) and the GIL's lack of fairness would exacerbate that. If I may suggest: - use multiprocessing or some other worker-pool approach rather than threads - or set setcheckinterval down low (e.g. to 20 or something) - do multiple units of work (in separate transactions) within each worker, aim for e.g. 10 seconds or work or some such. - log with enough detail that we can report on the actual concurrency achieved. E.g. log the time in us when each transaction starts and finishes, then we can assess how many concurrent requests were actually running. If the results are still the same - great, full steam ahead. If not, well lets revisit :) -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Bug day
Hi again, I'd like to share some results of the bug day we've conducted on 2014-11-21. Stats: - 73 New bugs https://bugs.launchpad.net/neutron/+bugs?search=Searchfield.status=New - 795 Open bugs https://bugs.launchpad.net/neutron/+bugs - 285 In-progress bugs https://bugs.launchpad.net/neutron/+bugs?search=Searchfield.status=In+Progress I personally went over some opened/new bugs with High/Medium importance, trying to detect duplicates, get rid of some bugs that were not active for too long (like ones that have been filed back in 2013), pinging submitters to provide more info and such. I've also moved some bugs from 'In progress' to 'New' or 'Confirmed' and removed assignees if their submitted patches were either abandoned or have not been updated for months. So don't be surprised if I've removed someone. As Russel Bryant has mentioned, assignment might potentially discourage people from looking into the bug. Thanks everyone for helping with this! Eugene. On Fri, Nov 21, 2014 at 11:03 AM, Eugene Nikanorov enikano...@mirantis.com wrote: Hi neutron folks! Today we've decided to conduct bug triaging day. We have more than one thousand bugs needing their state to be checked. So everyone is welcome to participate! The goals of bug triaging day are: 1) Decrease the number of New bugs. Possible 'resolution' would be: - confirm bug. If you see the issue in the code, or you can reproduce it - mark as Incomplete. Bug description doesn't contain sufficient information to triage the bug. - mark as Invalid. Not a bug, or we're not going to fix it. - mark as duplicate. If you know that other bug filed earlier is describing the same issue. - mark as Fix committed if you know that the issue was fixed. It's good if you could provide a link to corresponding review. 2) Check the Open and In progress bugs. If the last activity on the bug happened more than a month ago - it makes sense sometimes to bring it back to 'New'. By activity I mean comments in the bug, actively maintained patch on review, and such. Of course feel free to assign a bug to yourself if you know how and going to fix it. Some statistics: - 85 New bugs https://bugs.launchpad.net/neutron/+bugs?search=Searchfield.status=New - 811 Open bugs https://bugs.launchpad.net/neutron/+bugs - 331 In-progress bugs https://bugs.launchpad.net/neutron/+bugs?search=Searchfield.status=In+Progress Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron] Bug day
Hi neutron folks! Today we've decided to conduct bug triaging day. We have more than one thousand bugs needing their state to be checked. So everyone is welcome to participate! The goals of bug triaging day are: 1) Decrease the number of New bugs. Possible 'resolution' would be: - confirm bug. If you see the issue in the code, or you can reproduce it - mark as Incomplete. Bug description doesn't contain sufficient information to triage the bug. - mark as Invalid. Not a bug, or we're not going to fix it. - mark as duplicate. If you know that other bug filed earlier is describing the same issue. - mark as Fix committed if you know that the issue was fixed. It's good if you could provide a link to corresponding review. 2) Check the Open and In progress bugs. If the last activity on the bug happened more than a month ago - it makes sense sometimes to bring it back to 'New'. By activity I mean comments in the bug, actively maintained patch on review, and such. Of course feel free to assign a bug to yourself if you know how and going to fix it. Some statistics: - 85 New bugs https://bugs.launchpad.net/neutron/+bugs?search=Searchfield.status=New - 811 Open bugs https://bugs.launchpad.net/neutron/+bugs - 331 In-progress bugs https://bugs.launchpad.net/neutron/+bugs?search=Searchfield.status=In+Progress Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions
Comments inline: On Thu, Nov 20, 2014 at 4:34 AM, Jay Pipes jaypi...@gmail.com wrote: So while the SELECTs may return different data on successive calls when you use the READ COMMITTED isolation level, the UPDATE statements will continue to return 0 rows affected **if they attempt to change rows that have been changed since the start of the transaction** The reason that changing the isolation level to READ COMMITTED appears to work for the code in question: https://github.com/openstack/neutron/blob/master/neutron/ plugins/ml2/drivers/helpers.py#L98 is likely because the SELECT ... LIMIT 1 query is returning a different row on successive attempts (though since there is no ORDER BY on the query, the returned row of the query is entirely unpredictable (line 112)). Since data from that returned row is used in the UPDATE statement (line 118 and 124), *different* rows are actually being changed by successive UPDATE statements. Not really, we're updating the same row we've selected. It's ensured by 'raw_segment' which actually contains 'gre_id' (or similar) attribute. So in each iteration we're working with the same row, but in different iterations READ COMMITTED allows us to see different data and hence work with a different row. What this means is that for this *very particular* case, setting the transaction isolation level to READ COMMITTTED will work presumably most of the time on MySQL, but it's not an appropriate solution for the generalized problem domain of the SELECT FOR UPDATE. If you need to issue a SELECT and an UPDATE in a retry loop, and you are attempting to update the same row or rows (for instance, in the quota reservation or resource allocation scenarios), this solution will not work, even with READ COMMITTED. This is why I say it's not really appropriate, and a better general solution is to use separate transactions for each loop in the retry mechanic. By saying 'this solution will not work' what issues do you mean what exactly? Btw, I agree on using separate transaction for each loop, the problem is that transaction is usually not 'local' to the method where the retry loop resides. The issue is about doing the retry within a single transaction. That's not what I recommend doing. I recommend instead doing short separate transactions instead of long-lived, multi-statement transactions and relying on the behaviour of the DB's isolation level (default or otherwise) to solve the problem of reading changes to a record that you intend to update. instead of long-lived, multi-statement transactions - that's really what would require quite large code redesign. So far finding a way to bring retry logic upper to the stack of nesting transactions seems more appropriate. Thanks, Eugene. Cheers, -jay Also, thanks Clint for clarification about example scenario described by Mike Bayer. Initially the issue was discovered with concurrent tests on multi master environment with galera as a DB backend. Thanks, Eugene On Thu, Nov 20, 2014 at 12:20 AM, Mike Bayer mba...@redhat.com mailto:mba...@redhat.com wrote: On Nov 19, 2014, at 3:47 PM, Ryan Moats rmo...@us.ibm.com mailto:rmo...@us.ibm.com wrote: BTW, I view your examples from oslo as helping make my argument for me (and I don't think that was your intent :) ) I disagree with that as IMHO the differences in producing MM in the app layer against arbitrary backends (Postgresql vs. DB2 vs. MariaDB vs. ???) will incur a lot more “bifurcation” than a system that targets only a handful of existing MM solutions. The example I referred to in oslo.db is dealing with distinct, non MM backends. That level of DB-specific code and more is a given if we are building a MM system against multiple backends generically. It’s not possible to say which approach would be better or worse at the level of “how much database specific application logic do we need”, though in my experience, no matter what one is trying to do, the answer is always, “tons”; we’re dealing not just with databases but also Python drivers that have a vast amount of differences in behaviors, at every level.On top of all of that, hand-rolled MM adds just that much more application code to be developed and maintained, which also claims it will do a better job than mature (ish?) database systems designed to do the same job against a specific backend. My reason for asking this question here is that if the community wants to consider #2, then these problems are the place to start crafting that solution - if we solve the conflicts inherent with the two conncurrent thread scenarios, then I think we will find that we've solved the multi-master problem essentially for free”. Maybe I’m missing something, if we learn how to write out a row such that a concurrent transaction
Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions
*You are 100% correct that setting the transaction isolation level to READ COMMITTED works in the retry loop*. I stand corrected, and humbled :) Please accept my apologies. Thanks for letting me know :) One thing I did note, though, is that changing the isolation level of an *already-started transaction* does not change the current transaction's isolation level -- the new isolation level only takes effect once the previously started transaction is committed or rolled back. So, on line 107 in your proposed patch here: https://review.openstack.org/#/c/129288/5/neutron/plugins/ ml2/drivers/helpers.py From what I could find out in my research, the setting of the isolation level needs to be done *outside* of the session.begin() call, otherwise the isolation level will not take effect until that transaction is committed or rolled back. You're right. Apparently I've misread sqlalchemy docs at some point. Now I see I've understood them incorrectly. Also, from some sqlalchemy code inspection I thought that default engine isolation level is restored after current transaction is committed, but it's not so as well. Of course, if SQLAlchemy is doing some auto-commit or something in the session, then you may not see this affect, but I certainly was able to see this in my testing in mysql client sessions... so I'm a little perplexed as to how your code works on already-started transactions. The documentation on the MySQL site backs up what I observed: http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html ...the statement sets the default transaction level for all subsequent transactions performed within the current session. That basically means that my patch effectively changes isolation level of every connection that happens to be used for the session. All the best, and thanks for the informative lesson of the week! Thank you for getting to the very bottom of it! :) Eugene. -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions
Wow, lots of feedback in a matter of hours. First of all, reading postgres docs I see that READ COMMITTED is the same as for mysql, so it should address the issue we're discussing: *Read Committed* is the default isolation level in PostgreSQL. When a transaction uses this isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) *sees only data committed before the query began (not before TX began - Eugene)*; it never sees either uncommitted data or changes committed during query execution by concurrent transactions. In effect, a SELECT query sees a snapshot of the database as of the instant the query begins to run. However, SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed. *Also note that two successive **SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes during execution of the first SELECT. * http://www.postgresql.org/docs/8.4/static/transaction-iso.html So, in my opinion, unless neutron code has parts that rely on 'repeatable read' transaction isolation level (and I believe such code is possible, didn't inspected closely yet), switching to READ COMMITTED is fine for mysql. On multi-master scenario: it is not really an advanced use case. It is basic, we need to consider it as a basic and build architecture with respect to this fact. Retry approach fits well here, however it either requires proper isolation level, or redesign of whole DB access layer. Also, thanks Clint for clarification about example scenario described by Mike Bayer. Initially the issue was discovered with concurrent tests on multi master environment with galera as a DB backend. Thanks, Eugene On Thu, Nov 20, 2014 at 12:20 AM, Mike Bayer mba...@redhat.com wrote: On Nov 19, 2014, at 3:47 PM, Ryan Moats rmo...@us.ibm.com wrote: BTW, I view your examples from oslo as helping make my argument for me (and I don't think that was your intent :) ) I disagree with that as IMHO the differences in producing MM in the app layer against arbitrary backends (Postgresql vs. DB2 vs. MariaDB vs. ???) will incur a lot more “bifurcation” than a system that targets only a handful of existing MM solutions. The example I referred to in oslo.db is dealing with distinct, non MM backends. That level of DB-specific code and more is a given if we are building a MM system against multiple backends generically. It’s not possible to say which approach would be better or worse at the level of “how much database specific application logic do we need”, though in my experience, no matter what one is trying to do, the answer is always, “tons”; we’re dealing not just with databases but also Python drivers that have a vast amount of differences in behaviors, at every level.On top of all of that, hand-rolled MM adds just that much more application code to be developed and maintained, which also claims it will do a better job than mature (ish?) database systems designed to do the same job against a specific backend. My reason for asking this question here is that if the community wants to consider #2, then these problems are the place to start crafting that solution - if we solve the conflicts inherent with the two conncurrent thread scenarios, then I think we will find that we've solved the multi-master problem essentially for free”. Maybe I’m missing something, if we learn how to write out a row such that a concurrent transaction against the same row doesn’t throw us off, where is the part where that data is replicated to databases running concurrently on other IP numbers in a way that is atomic come out of that effort “for free” ? A home-rolled “multi master” scenario would have to start with a system that has multiple create_engine() calls, since we need to communicate directly to multiple database servers. From there it gets really crazy. Where’sall that ? Boiled down, what you are talking about here w.r.t. concurrent transactions is really conflict resolution, which is the hardest part of implementing multi-master (as a side note, using locking in this case is the equivalent of option #1). All I wished to point out is that there are other ways to solve the conflict resolution that could then be leveraged into a multi-master scenario. As for the parts that I glossed over, once conflict resolution is separated out, replication turns into a much simpler problem with well understood patterns and so I view that part as coming for free. Ryan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions
But the isolation mode change won’t really help here as pointed out by Jay; discrete transactions have to be used instead. I still think it will, per postgres documentation (which might look confusing, but still...) It actually helps for mysql, that was confirmed. For postgres it appears to be the same. Thanks, Eugene. On Thu, Nov 20, 2014 at 12:56 AM, Mike Bayer mba...@redhat.com wrote: On Nov 19, 2014, at 4:14 PM, Clint Byrum cl...@fewbar.com wrote: One simply cannot rely on multi-statement transactions to always succeed. agree, but the thing you want is that the transaction either succeeds or explicitly fails, the latter hopefully in such a way that a retry can be added which has a chance at succeeding, if needed. We have transaction replay logic in place in nova for example based on known failure conditions like concurrency exceptions, and this replay logic works, because it starts a new transaction. In this specific case, since it’s looping within a transaction where the data won’t change, it’ll never succeed, and the retry mechanism is useless. But the isolation mode change won’t really help here as pointed out by Jay; discrete transactions have to be used instead. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron] DB: transaction isolation and related questions
Hi neutron folks, There is an ongoing effort to refactor some neutron DB logic to be compatible with galera/mysql which doesn't support locking (with_lockmode('update')). Some code paths that used locking in the past were rewritten to retry the operation if they detect that an object was modified concurrently. The problem here is that all DB operations (CRUD) are performed in the scope of some transaction that makes complex operations to be executed in atomic manner. For mysql the default transaction isolation level is 'REPEATABLE READ' which means that once the code issue a query within a transaction, this query will return the same result while in this transaction (e.g. the snapshot is taken by the DB during the first query and then reused for the same query). In other words, the retry logic like the following will not work: def allocate_obj(): with session.begin(subtrans=True): for i in xrange(n_retries): obj = session.query(Model).filter_by(filters) count = session.query(Model).filter_by(id=obj.id).update({'allocated': True}) if count: return obj since usually methods like allocate_obj() is called from within another transaction, we can't simply put transaction under 'for' loop to fix the issue. The particular issue here is https://bugs.launchpad.net/neutron/+bug/1382064 with the proposed fix: https://review.openstack.org/#/c/129288 So far the solution proven by tests is to change transaction isolation level for mysql to be 'READ COMMITTED'. The patch suggests changing the level for particular transaction where issue occurs (per sqlalchemy, it will be reverted to engine default once transaction is committed) This isolation level allows the code above to see different result in each iteration. At the same time, any code that relies that repeated query under same transaction gives the same result may potentially break. So the question is: what do you think about changing the default isolation level to READ COMMITTED for mysql project-wise? It is already so for postgress, however we don't have much concurrent test coverage to guarantee that it's safe to move to a weaker isolation level. Your feedback is appreciated. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron] Let's fix all neutron bugs! (NOT)
Hi folks, I have been supervising bug list for neutron during the last release cycle, trying to do some housekeeping, prioritizing issues and fixing some of them. As you might see, amount of bugs (even staying in New state) doesn't go down. Bugs appear at quite fast pace, and some of them hang for quite a long time, especially if someone has assigned the bug on himself and then abandoned working on it. One of the other reasons for that is that we lack volunteers willing to fix those bugs. So, If you're willing to help, have some knowledge of neutron and its codebase or you have a lab where you can reproduce (and hence, confirm) the bug and provide more additional debugging info, that would be great! My plan is to get your contact, knowing what part of neutron project you familiar with, and then assign bugs directly to you if I feel that the issue matches your experience. I just want to make bug triaging/fixing process a bit more iterative and efficient, with a help of community. So please reach directly to me and let me know what you are interested/experienced with. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron] Improving dhcp agent scheduling interface
Hi folks, I'd like to raise a discussion kept in irc and in gerrit recently: https://review.openstack.org/#/c/131944/ The intention of the patch is to clean up particular scheduling method/interface: schedule_network. Let me clarify why I think it needs to be done (beside code api consistency reasons): Scheduling process is ultimately just a two steps: 1) choosing appropriate agent for the network 2) adding binding between the agent and the network To perform those two steps one doesn't need network object, network_id is satisfactory for this need. However, there is a concern, that having full dict (or full network object) could allow us to do more flexible things in step 1 like deciding, whether network should be scheduled at all. See the TODO for the reference: https://github.com/openstack/neutron/blob/master/neutron/scheduler/dhcp_agent_scheduler.py#L64 However, this just puts an unnecessary (and actually, incorrect) requirement on the caller, to provide the network dict, mainly because caller doesn't know what content of the dict the callee (scheduler driver) expects. Currently scheduler is only interested in ID, if there is another scheduling driver, it may now require additional parameters (like list of full subnet dicts) in the dict which may or may not be provided by the calling code. Instead of making assumptions about what is in the dict, it's better to go with simpler and clearer interface that will allow scheduling driver to do whatever makes sense to it. In other words: caller provides id, driver fetches everything it needs using the id. For existing scheduling drivers it's a no-op. I think l3 scheduling is an example of interface done in the more right way; to me it looks clearer and more consistent. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Improving dhcp agent scheduling interface
My comments inline: I would argue that it isn't, actually. You may need to know the state of the network to make that placement decision. Yes, I could agree with that - and that's just a particular scheduling implementation, not a requirement for the interface. Just passing the id may cause the scheduling logic to issue an extra DB query that can be easily avoided if the right interface between the caller of a scheduler and the scheduler itself was in place. Yes, i may cause scheduling logic to issue a query, *iff* it needs it. For instance we cannot fix [1] (as you pointed out) today because the method only accepts a dict that holds just a partial representation of the network. If we had the entire DB object we would avoid that and just passing the id is going in the opposite direction IMO And here is another issue, I think. Requiring an object is something not quite clear at this point: if scheduling needs to be aware of subnets - network object is not enough then, and that's why I think we only need to provide ids that allow scheduling logic to act on it's own. However, there is a concern, that having full dict (or full network object) could allow us to do more flexible things in step 1 like deciding, whether network should be scheduled at all. That's the whole point of scheduling, is it not? Right, and we are just arguing, who should prepare the data needed to make the scheduling decision. I just think that scheduling logic may potentially require more than just network object. In my concrete example, i want to schedule a network which my code moves from a dead agent to some alive agent. I only have a network id during that operation. I'd like to avoid issuing DB query as well - just as you. My first thought was something like: self.schedule_network(context, {'id': network_id}) - which is clearly a dirty hack! But that's what the interface is forcing me to do. Or, it forces me to fetch the network which I'd like to avoid as well. That's why I want scheduling decide, whether it needs additional data or not. https://github.com/openstack/neutron/blob/master/neutron/scheduler/dhcp_agent_scheduler.py#L64 However, this just puts an unnecessary (and actually, incorrect) requirement on the caller, to provide the network dict, mainly because caller doesn't know what content of the dict the callee (scheduler driver) expects. Why is it incorrect? We should move away from dictionaries and passing objects Passing objects is for sure much stronger api contract, however I think it leads to the same level of overhead, if not worse! For instance, will network object include the collection of its subnet objects? Will they, in turn, include ipallocations and such? If the answer is No (and my opinion that it *must* be No), then we don't win much with object approach. If the answer is yes - we're fetching way too much from the DB to create network object; it's much bigger overhead then additional db query in a scheduling driver. No, the scheduler needs to know about the state of the network to do proper placement. It's a side-effect of the default scheduling (i.e. random). If we want to do more intelligent placement we need the state of the network. That's for sure, the question is only about who prepares the data: caller or the scheduler. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Migrations in feature branch
Apparently I've mistakenly though that feature branch will form separate optional component. If it will eventually be a part of neutron - then it's fine. Thanks, Eugene. On Wed, Sep 24, 2014 at 1:30 PM, Salvatore Orlando sorla...@nicira.com wrote: Relying again on automatic schema generation could be error-prone. It can only be enabled globally, and does not work when models are altered if the table for the model being altered already exists in the DB schema. I don't think it would be a big problem to put these migrations in the main sequence once the feature branch is merged back into master. Alembic unfortunately does not yet do a great job in maintaining multiple timelines. Even if only a single migration branch is supported, in theory one could have a separate alembic environment for the feature branch, but that in my opinion just creates the additional problem of handling a new environment, and does not solve the initial problem of re-sequencing migrations. Re-sequencing at merge time is not going to be a problem in my opinion. However, keeping all the lbaas migrations chained together will help. You can also do as Henry suggests, but that option has the extra (possibly negligible) cost of squashing all migrations for the whole feature branch at merge time. As an example: MASTER --- X - X+1 - ... - X+n \ FEATURE \- Y - Y+1 - ... - Y+m At every rebase of rebase the migration timeline for the feature branch could be rearranged as follows: MASTER --- X - X+1 - ... - X+n --- \ FEATURE \- Y=X+n - Y+1 - ... - Y+m = X+n+m And therefore when the final merge in master comes, all the migrations in the feature branch can be inserted in sequence on top of master's HEAD. I have not tried this, but I reckon that conceptually it should work. Salvatore On 24 September 2014 08:16, Kevin Benton blak...@gmail.com wrote: If these are just feature branches and they aren't intended to be deployed for long life cycles, why don't we just skip the db migration and enable auto-schema generation inside of the feature branch? Then a migration can be created once it's time to actually merge into master. On Tue, Sep 23, 2014 at 9:37 PM, Brandon Logan brandon.lo...@rackspace.com wrote: Well the problem with resequencing on a merge is that a code change for the first migration must be added first and merged into the feature branch before the merge is done. Obviously this takes review time unless someone of authority pushes it through. We'll run into this same problem on rebases too if we care about keeping the migration sequenced correctly after rebases (which we don't have to, only on a merge do we really need to care). If we did what Henry suggested in that we only keep one migration file for the entire feature, we'd still have to do the same thing. I'm not sure that buys us much other than keeping the feature's migration all in one file. I'd also say that code in master should definitely NOT be dependent on code in a feature branch, much less a migration. This was a requirement of the incubator as well. So yeah this sounds like a problem but one that really only needs to be solved at merge time. There will definitely need to be coordination with the cores when merge time comes. Then again, I'd be a bit worried if there wasn't since a feature branch being merged into master is a huge deal. Unless I am missing something I don't see this as a big problem, but I am highly capable of being blind to many things. Thanks, Brandon On Wed, 2014-09-24 at 01:38 +, Doug Wiegley wrote: Hi Eugene, Just my take, but I assumed that we’d re-sequence the migrations at merge time, if needed. Feature branches aren’t meant to be optional add-on components (I think), nor are they meant to live that long. Just a place to collaborate and work on a large chunk of code until it’s ready to merge. Though exactly what those merge criteria are is also yet to be determined. I understand that you’re raising a general problem, but given lbaas v2’s state, I don’t expect this issue to cause many practical problems in this particular case. This is also an issue for the incubator, whenever it rolls around. Thanks, doug On September 23, 2014 at 6:59:44 PM, Eugene Nikanorov (enikano...@mirantis.com) wrote: Hi neutron and lbaas folks. Recently I briefly looked at one of lbaas proposed into feature branch. I see migration IDs there are lined into a general migration sequence. I think something is definitely wrong with this approach as feature-branch components are optional, and also master branch can't depend on revision IDs in feature-branch (as we moved to unconditional migrations) So far the solution to this problem that I see is to have separate migration script
[openstack-dev] [Neutron][LBaaS] Migrations in feature branch
Hi neutron and lbaas folks. Recently I briefly looked at one of lbaas proposed into feature branch. I see migration IDs there are lined into a general migration sequence. I think something is definitely wrong with this approach as feature-branch components are optional, and also master branch can't depend on revision IDs in feature-branch (as we moved to unconditional migrations) So far the solution to this problem that I see is to have separate migration script, or in fact, separate revision sequence. The problem is that DB models in feature branch may depend on models of master branch, which means that each revision of feature-branch should have a kind of minimum required revision of the master branch. The problem that revision IDs don't form linear order, so we can't have 'minimum' unless that separate migration script may analyze master branch migration sequence and find minimum required migration ID. Thoughts? Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] [LBaaS] Packet flow between instances using a load balancer
If we're talking about default haproxy driver for lbaas, then I'd say that the diagram is not quite correct because one could assume that LB_A and LB_B are kind of routing devices which have networks behind. Since haproxy is layer 4 loadbalancer, so packet received by RHB1 will have source of LB_B and destination of RHB1, and similarly for the opposite direction. In fact the packet is not modified, it's just different, since haproxy is not forwarding packets, it just opens connection to the backend server. Client's ip is usually forwarded via x-forwarded-for http header. Thanks, Eugene. On Thu, Sep 11, 2014 at 2:33 PM, Maish Saidel-Keesing maishsk+openst...@maishsk.com wrote: I am trying to find out how traffic currently flows went sent to an instance through a LB. Say I have the following scenario: RHA1 LB_A -- - LB_B --- RHB1 | | RHA2 ---| |- RHB2 A packet is sent from RHA1 to LB_B (with a final destination of course being either RHB1 or RHB2) I have a few questions about the flow. 1. When the packet is received by RHB1 - what is the source and destination address? Is the source RHA1 or LB_B? Is the destination LB_B or RHB_1? 2. When is the packet modified (if it is)? And how? 3. Traffic in the opposite direction. RHB1 - RHA1. What is the path that will be taken? The catalyst of this question was how to control traffic that is coming into instances through a LoadBalancer with security groups. At the moment you can either define a source IP/range or a security group. There is no way to add a LB to a security group (at least not that I know of). If the source IP that the packet is identified with - is the Load balancer (and I suspect it is) then there is no way to enforce the traffic flow. How would you all deal with this scenario and controlling the traffic flow? Any help / thoughts is appreciated! -- Maish Saidel-Keesing ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron] Allowing specifying object ID in create_* API
Hi neutrons, We've been discussing various ways of doing cloud upgrades. One of the safe and viable solutions seems to be moving existing resources to a new cloud deployed with new version of openstack. By saying 'moving' I mean replication of all resources and wiring everything together in a new cloud. This solution, while having its obvious drawbacks, is relatively simple, works through cloud public API and also allows users to move back to their existing working cloud at any time. One of the concerns of this approach seems to be the issue with client-side automatization. E.g. everything that relies on existing cloud data. Technically speaking, there is no way to fully replicate objects, because the API of different components, including Neutron, doesn't allow IDs to be provided at creation time. Surprisingly, it's purely API limitation; db code support ID being passed from the API. So my question is - is this API limitation really critical? I know getting rid of it will require to take care of some additional things like DBDuplicateErrors, but is there other reasons to keep it? Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Continuing on Calling driver interface on every API request
Hi folks, That actually going in opposite direction to what flavor framework is trying to do (and for dispatching it's doing the same as providers). REST call dispatching should really go via the root object. I don't quite get the issue with health monitors. If HM is incorrectly configured prior to association with a pool - API layer should handle that. I don't think driver implementations should be different at constraints to HM parameters. So I'm -1 on adding provider (or flavor) to each entity. After all, it looks just like data denormalization which actually will affect lots of API aspects in negative way. Thanks, Eugene. On Mon, Aug 11, 2014 at 11:20 PM, Vijay Venkatachalam vijay.venkatacha...@citrix.com wrote: Yes, the point was to say the plugin need not restrict and let driver decide what to do with the API. Even if the call was made to driver instantaneously, I understand, the driver might decide to ignore first and schedule later. But, if the call is present, there is scope for validation. Also, the driver might be scheduling an async-api to backend, in which case deployment error cannot be shown to the user instantaneously. W.r.t. identifying a provider/driver, how would it be to make tenant the default root object? tenantid is already associated with each of these entities, so no additional pain. For the tenant who wants to override let him specify provider in each of the entities. If you think of this in terms of the UI, let's say if the loadbalancer configuration is exposed as a single wizard (which has loadbalancer, listener, pool, monitor properties) then provider is chosen only once. Curious question, is flavour framework expected to address this problem? Thanks, Vijay V. -Original Message- From: Doug Wiegley [mailto:do...@a10networks.com] Sent: 11 August 2014 22:02 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Neutron][LBaaS] Continuing on Calling driver interface on every API request Hi Sam, Very true. I think that Vijay’s objection is that we are currently imposing a logical structure on the driver, when it should be a driver decision. Certainly, it goes both ways. And I also agree that the mechanism for returning multiple errors, and the ability to specify whether those errors are fatal or not, individually, is currently weak. Doug On 8/11/14, 10:21 AM, Samuel Bercovici samu...@radware.com wrote: Hi Doug, In some implementations Driver !== Device. I think this is also true for HA Proxy. This might mean that there is a difference between creating a logical object and when there is enough information to actually schedule/place this into a device. The ability to express such errors (detecting an error on a logical object after it was created but when it actually get scheduled) should be discussed and addressed anyway. -Sam. -Original Message- From: Doug Wiegley [mailto:do...@a10networks.com] Sent: Monday, August 11, 2014 6:55 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Neutron][LBaaS] Continuing on Calling driver interface on every API request Hi all, Validations such as ³timeout delay² should be performed on the API level before it reaches the driver. For a configuration tree (lb, listeners, pools, etc.), there should be one provider. You¹re right, but I think the point of Vijay¹s example was to highlight the combo error problem with populating all of the driver objects at once (in short, the driver interface isn¹t well suited to that model.) That his one example can be covered by API validators is irrelevant. Consider a backend that does not support APP_COOKIE¹s, or HTTPS_TERMINATED (but has multiple listeners) instead. Should the entire load balancer create fail, or should it offer degraded service? Do all drivers have to implement a transaction rollback; wait, the interface makes that very hard. That¹s his point. The driver is no longer just glue code between interfaces; it¹s now a mini-object error handler. Having provider defined in multiple places does not make sense. Channeling Brandon, who can yell if I get this wrong, the point is not to have a potentially different provider on each object. It¹s to allow a provider to be assigned when the first object in the tree is created, so that future related objects will always get routed to the same provider. Not knowing which provider should get all the objects is why we have to wait until we see a LoadBalancer object. All of this sort of edge case nonsense is because we (the royal we, the community), wanted all load balancer objects to be ³root² objects, even though only one of them is an actual root today, to support many-to-many relationships among all of them, at some future date, without an interface change. If my bias is showing that I¹m not a fan of adding this complexity
Re: [openstack-dev] [Neutron][LBaaS] Continuing on Calling driver interface on every API request
Well, that exactly what we've tried to solve with tags in the flavor. Considering your example with whole configuration being sent to the driver - i think it will be fine to not apply unsupported parts of configuration (like such HM) and mark the HM object with error status/status description. Thanks, Eugene. On Tue, Aug 12, 2014 at 12:33 AM, Brandon Logan brandon.lo...@rackspace.com wrote: Hi Eugene, An example of the HM issue (and really this can happen with any entity) is if the driver the API sends the configuration to does not actually support the value of an attribute. For example: Provider A support PING health monitor type, Provider B does not. API allows the PING health monitor type to go through. Once a load balancer has been linked with that health monitor and the LoadBalancer chose to use Provider B, that entire configuration is then sent to the driver. The driver errors out not on the LoadBalancer create, but on the health monitor create. I think that's the issue. Thanks, Brandon On Tue, 2014-08-12 at 00:17 +0400, Eugene Nikanorov wrote: Hi folks, That actually going in opposite direction to what flavor framework is trying to do (and for dispatching it's doing the same as providers). REST call dispatching should really go via the root object. I don't quite get the issue with health monitors. If HM is incorrectly configured prior to association with a pool - API layer should handle that. I don't think driver implementations should be different at constraints to HM parameters. So I'm -1 on adding provider (or flavor) to each entity. After all, it looks just like data denormalization which actually will affect lots of API aspects in negative way. Thanks, Eugene. On Mon, Aug 11, 2014 at 11:20 PM, Vijay Venkatachalam vijay.venkatacha...@citrix.com wrote: Yes, the point was to say the plugin need not restrict and let driver decide what to do with the API. Even if the call was made to driver instantaneously, I understand, the driver might decide to ignore first and schedule later. But, if the call is present, there is scope for validation. Also, the driver might be scheduling an async-api to backend, in which case deployment error cannot be shown to the user instantaneously. W.r.t. identifying a provider/driver, how would it be to make tenant the default root object? tenantid is already associated with each of these entities, so no additional pain. For the tenant who wants to override let him specify provider in each of the entities. If you think of this in terms of the UI, let's say if the loadbalancer configuration is exposed as a single wizard (which has loadbalancer, listener, pool, monitor properties) then provider is chosen only once. Curious question, is flavour framework expected to address this problem? Thanks, Vijay V. -Original Message- From: Doug Wiegley [mailto:do...@a10networks.com] Sent: 11 August 2014 22:02 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Neutron][LBaaS] Continuing on Calling driver interface on every API request Hi Sam, Very true. I think that Vijay’s objection is that we are currently imposing a logical structure on the driver, when it should be a driver decision. Certainly, it goes both ways. And I also agree that the mechanism for returning multiple errors, and the ability to specify whether those errors are fatal or not, individually, is currently weak. Doug On 8/11/14, 10:21 AM, Samuel Bercovici samu...@radware.com wrote: Hi Doug, In some implementations Driver !== Device. I think this is also true for HA Proxy. This might mean that there is a difference between creating a logical object and when there is enough information to actually schedule/place this into a device. The ability to express such errors (detecting an error on a logical object after it was created but when it actually get scheduled) should be discussed and addressed anyway. -Sam. -Original Message- From: Doug Wiegley [mailto:do...@a10networks.com] Sent: Monday, August 11, 2014 6:55 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Neutron][LBaaS] Continuing on Calling driver interface on every API
[openstack-dev] [Neutron] Bug squashing day
Hi neutron folks, Today should have been 'Bug squashing day' where we go over existing bugs filed for the project and triage/prioritize/comment on them. I've created an etherpad with (hopefully) full list of neutron bugs: https://etherpad.openstack.org/p/neutron-bug-squashing-day-2014-08-07 I was able to walk through a couple of almost thousand of bugs we have. My target was to reduce the number of open bugs, so some of them I moved to incomplete/invalid/won't fix state (not many though); then, to reduce the number of high importance bugs, especially if they're hanging for too long. As you can see, bugs in the etherpad are sorted by importance. Some of my observations include: - almost all bugs with High priority really seem like issues we should be fixing. In many cases submitter or initial contributor abandoned his work on the bug... - there are a couple of important bugs related to DVR where previously working stuff is broken, but in all cases there are DVR subteam members working on those, so we're good here so far. I also briefly described resolution for each bug, where 'n/a' means that bug just needs to be fixed/work should be continued without any change to state. I'm planning to continue to go over this list and expect more bugs will go away which previously have been marked as medium/low or wishlist. If anyone is willing to help - you're welcome! Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] How to improve the specs review process (was Re: [Neutron] Group Based Policy and the way forward)
On Wed, Aug 6, 2014 at 11:07 PM, Stefano Maffulli stef...@openstack.org wrote: On 08/06/2014 11:19 AM, Edgar Magana wrote: That is the beauty of the open source projects, there is always a smartest reviewer catching out the facts that you don¹t. And yet, the specification clearly talks about 'endpoints' and nobody caught it where it supposed to be caught so I fear that something failed badly here: I know that there's whole other thread on naming. I believe everybody has reviewed this having keystone's endpoint in mind and understanding that those are different terms where keystone endpoints should have been named 'service_endpoints' or something. There's no UX or technical reasons to not to reuse terms used in different projects and in different domains. So I don't think it's fair to blame reviewers here. Thanks, Eugene. https://review.openstack.org/#/c/89469/10 What failed and how we make sure this doesn't happen again? This to me is the most important question to answer. If I remember correctly we introduced the concept of Specs exactly to discuss on the ideas *before* the implementation starts. We wanted things like architecture, naming conventions and other important decisions to be socialized and agreed upon *before* code was proposed. We wanted to avoid developers to spend time implementing features in ways that are incompatible and likely to be rejected at code review time. And yet, here we are. Something failed and I would ask for all core reviewers to sit down and do an exercise to identify the root cause. If you want we can start from this specific case, do some simple root cause analysis together and take GBP as an example. Thoughts? /stef ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Cross-server locking for neutron server
In fact there are more applications for distributed locking than just accessing data in database. One of such use cases is serializing access to devices. This is what is not yet hardly needed, but will be as we get more service drivers working with appliances. It would be great if some existing library could be adopted for it. Thanks, Eugene. On Thu, Jul 31, 2014 at 12:53 AM, Jay Pipes jaypi...@gmail.com wrote: On 07/30/2014 12:21 PM, Kevin Benton wrote: Maybe I misunderstood your approach then. I though you were suggesting where a node performs an UPDATE record WHERE record = last_state_node_saw query and then checks the number of affected rows. That's optimistic locking by every definition I've heard of it. It matches the following statement from the wiki article you linked to as well: The latter situation (optimistic locking) is only appropriate when there is less chance of someone needing to access the record while it is locked; otherwise it cannot be certain that the update will succeed because the attempt to update the record will fail if another user updates the record first. Did I misinterpret how your approach works? The record is never locked in my approach, why is why I don't like to think of it as optimistic locking. It's more like optimistic read and update with retry if certain conditions continue to be met... :) To be very precise, the record is never locked explicitly -- either through the use of SELECT FOR UPDATE or some explicit file or distributed lock. InnoDB won't even hold a lock on anything, as it will simply add a new version to the row using its MGCC (sometimes called MVCC) methods. The technique I am showing in the patch relies on the behaviour of the SQL UPDATE statement with a WHERE clause that contains certain columns and values from the original view of the record. The behaviour of the UPDATE statement will be a NOOP when some other thread has updated the record in between the time that the first thread read the record, and the time the first thread attempted to update the record. The caller of UPDATE can detect this NOOP by checking the number of affected rows, and retry the UPDATE if certain conditions remain kosher... So, there's actually no locks taken in the entire process, which is why I object to the term optimistic locking :) I think where the confusion has been is that the initial SELECT and the following UPDATE statements are *not* done in the context of a single SQL transaction... Best, -jay On Wed, Jul 30, 2014 at 11:07 AM, Jay Pipes jaypi...@gmail.com mailto:jaypi...@gmail.com wrote: On 07/30/2014 10:53 AM, Kevin Benton wrote: Using the UPDATE WHERE statement you described is referred to as optimistic locking. [1] https://docs.jboss.org/__jbossas/docs/Server___ Configuration_Guide/4/html/__The_CMP_Engine-Optimistic___Locking.html https://docs.jboss.org/jbossas/docs/Server_ Configuration_Guide/4/html/The_CMP_Engine-Optimistic_Locking.html SQL != JBoss. It's not optimistic locking in the database world. In the database world, optimistic locking is an entirely separate animal: http://en.wikipedia.org/wiki/__Lock_(database) http://en.wikipedia.org/wiki/Lock_(database) And what I am describing is not optimistic lock concurrency in databases. -jay _ OpenStack-dev mailing list OpenStack-dev@lists.openstack.__org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack-dev http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Kevin Benton ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron] Flavor Framework spec approval deadline exception
Hi folks, I'd like to request an exception for the Flavor Framework spec: https://review.openstack.org/#/c/102723/ It already have more or less complete server-side implementation: https://review.openstack.org/#/c/105982/ CLI will be posted on review soon. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Flavor framework proposal
think it will become more obvious as projects like Octavia gain maturity whether they should be split off and become completely independent, be loosely coupled, or simply remain a vendor of Neutron LBaaS. :) How flavors get handled in these scenarios is part of that discussion-- but that discussion probably isn't relevant right now. If you want to provide the illusion of two different top-level services / API endpoints having the same flavor, then I would say, that's what orchestration is for. Totally agree on this point. On Tue, Jul 15, 2014 at 2:07 PM, Eugene Nikanorov enikano...@mirantis.com wrote: Hi Stephen, So, as was discussed, existing proposal has some aspects which better to be postponed, like extension list on the flavor (instead of tags). Agreed-- I think we need to more fully flesh out how extension list / tags should work here before we implement it. But this doesn't prevent us from rolling forward with a version 1 of flavors so that we can start to use some of the benefits of having flavors (like the ability to use multiple service profiles with a single driver/provider, or multiple service profiles for a single kind of service). Particularly that idea has several drawbacks: - it makes public API inflexible - turning features on/off is not what flavors should be doing, it's a task for policy framework and not flavors - flavor-based rest call dispatching is quite complex solution giving no benefits for service plugins I'm confused as to what you mean by that idea here. Are you taking about the extension list? If this is the case, I agree that that aspect needs to be refined and should probably be postponed if possible. I was under the impression that the extension list was a set of neutron extensions supported by all the service providers in the flavour. This could probably be enforced as an API level constraint. I don't think there's much agreement on what the extension list should actually be at the present time (honestly, it seem close enough to the 'tags' proposal in Eugene's original spec that I keep mistaking one for the other). This is one of the reasons I'm in favor of deferring that discussion until version 2 of flavors, and simply working with the free-form 'description' field for now which is informational but shouldn't be considered programmatically consumable. While this is not explicitly written in proposal - that's what implied there. I think that one is a major blocker of the proposal right now, it deserves future discussion and not essential to the problem flavors are supposed to solve. Yes, I think there are many benefits we can get out of the flavor framework without having to have an extensions list / tags at this revision. But I'm curious: Did we ever define what we were actually trying to solve with flavors? Maybe that's the reason the discussion on this has been all of the place: People are probably making assumptions about the problem we're trying to solve and we need to get on the same page about this. That is what I've been saying for over a year. Whatever you call it service type, service provider or flavour it appears that's impossible to find two persons who think about it in the same way. Haha! Great! Well, when it comes to proposals like this, I prefer to think of them in terms of what problem are we trying to solve and then limiting scope so we can actually produce something before the cold death of the universe. So! With that in mind, I agree with what Eugene said on this: The original problem has several aspects: 1) providing users with some information about what service implementation they get (capabilities) 2) providing users with ability to specify (choose, actually) some implementation details that don't relate to a logical configuration (capacity, insertion mode, HA mode, resiliency, security standards, etc) 3) providing operators a way to setup different modes of one driver 4) providing operators to seamlessly change drivers backing up existing logical configurations (now it's not so easy to do because logical config is tightly coupled with provider/driver) The proposal we're discussing right is mostly covering points (2), (3) and (4) which is already a good thing. So for now I'd propose to put 'information about service implementation' in the description to cover (1) If there are problems people are trying to solve with flavors that are not among the above 4 points, I would suggest that these either become a part of a later revision of flavors, or simply get discussed as a new entity entirely (depending on what people are after). Anyone have objections to this? (Sorry for the abruptness on my part on this: We really need to get flavors into Juno because we have features of LBaaS which are probably going to depend on it. I realize I've been dominating this discussion because of the time pressure here, but I really am interested
Re: [openstack-dev] [Neutron] Flavor framework proposal
Hi Stephen, So, as was discussed, existing proposal has some aspects which better to be postponed, like extension list on the flavor (instead of tags). Particularly that idea has several drawbacks: - it makes public API inflexible - turning features on/off is not what flavors should be doing, it's a task for policy framework and not flavors - flavor-based rest call dispatching is quite complex solution giving no benefits for service plugins While this is not explicitly written in proposal - that's what implied there. I think that one is a major blocker of the proposal right now, it deserves future discussion and not essential to the problem flavors are supposed to solve. Other than that, I personally don't have much disagreements on the proposal. The question about service type on the flavor is minor IMO. We can allow it to be NULL, which would mean multiservice flavor. However, multiservice flavors may put some minor requirements to driver API (that's mainly because of how flavor plugin interacts with service plugins) Thanks, Eugene. On Tue, Jul 15, 2014 at 11:21 PM, Stephen Balukoff sbaluk...@bluebox.net wrote: Hi folks! I've noticed progress on the flavor framework discussion slowing down over the last week. We would really like to see this happen for Juno because it's critical for many of the features we'd also like to get into Juno for LBaaS. I understand there are other Neutron extensions which will need it too. The proposal under discussion is here: https://review.openstack.org/#/c/102723/ One of the things I've seen come up frequently in the comments is the idea that a single flavor would apply to more than one service type (service type being 'LBaaS', 'FWaaS', 'VPNaaS', etc.). I've commented extensively on this, and my opinion is that this doesn't make a whole lot of sense. However, there are people who have a different view on this, so I would love to hear from them: Could you describe a specific usage scenario where this makes sense? What are the characteristics of a flavor that applies to more than one service type? Let's see if we can come to some conclusions on this so that we can get flavors into Juno, please! Thanks, Stephen -- Stephen Balukoff Blue Box Group, LLC (800)613-4305 x807 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Flavor framework: Conclusion
Hi folks, I will try to respond both recent emails: What I absolutely don’t want is users getting Bronze load balancers and using TLS and L7 on them. My main objection of having extension list on the flavor is that it is actually doesn't allow you to do what you want to do. Flavor is the entity that is used when user creates service instance, like loadbalancer, firewall or vpnservice objects. The extensions you are talking about provide access to REST resources which may not be directly attached to an instance. Which means that user may create those object without bothering with flavors at all. You can't turn off access to those REST resources, because user doesn't need to use flavors to access them. The second objection is more minor - this is a different problem then we are trying to solve right now. I suggested to postpone it until we have clearer vision of how it is going to work. My understanding of the flavor framework was that by specifying (or not specifying) extensions I can create a diverse offering meeting my business needs. Well, that's actually is not difficult: we have metadata in a service profile, admin can turn extensions on and off there. As I said before, extension in the flavor is too coarse-grained to specify supported API aspects, secondly, it can't be used to actually turn extensions on or off. The way you are describing it the user selects, say a bronze flavor, and the system might or might not put it on a load balancer with TLS. In first implementation this would be the responsibility of the description to provide such information, and the responsibility of a admin to provide proper mapping between flavor and service profile. in your example, say if I don’t have any TLS capable load balancers left and the user requests them How user can request such load balancer if he/she doesn't see appropriate flavor? I'm just telling that if extension list on the flavor doesn't solve the problems it supposed to solve - it's no better than providing such information in the description. To Mark's comments: The driver should not be involved. We can use the supported extensions to determine is associated logical resources are supported. In example above - user may only know about certain limitations when accessing core API, which you can't turn off. Say, create a listener with certificate_id (or whatever object is responsible for keeping a certificate). In other words: in order to perform any kind of dispatching that will actually turn off access to TLS (in the core API) we will need to implement some complex dispatching which consider not only REST resources of the extension, but also attributes of the core API used in the request. I think that's completely unnecessary. Otherwise driver behaviors will vary wildly I don't see why it should. Once admin provided proper mapping between flavor and service profile (where, as I suggested above, you may turn on/off the extensions with metadata) driver should behave according to the flavor. It's then up to our implementation on what to return to user in case it tries to access the extension unsupported in a given mode. But it still will work at the point of association (cert with listener, l7 policy with listener, etc) Another point is that you look at the extension list more closely - you'll see that it's no better then tags, and that's the reason to move that to service profile's metadata. I don't think dispatching should be done on the basis of what is defined on the flavor - it is a complex solution giving no benefits over existing dispatching method. Thanks, Eugene. On Mon, Jul 7, 2014 at 8:41 PM, Mark McClain mmccl...@yahoo-inc.com wrote: On Jul 4, 2014, at 1:09 AM, Eugene Nikanorov enikano...@mirantis.com wrote: German, First of all extension list looks lbaas-centric right now. Actually far from it. SSL VPN should be service extension. Secondly, TLS and L7 are such APIs which objects should not require loadbalancer or flavor to be created (like pool or healthmonitor that are pure db objects). Only when you associate those objects with loadbalancer (or its child objects), driver may tell if it supports them. Which means that you can't really turn those on or off, it's a generic API. The driver should not be involved. We can use the supported extensions to determine is associated logical resources are supported. Otherwise driver behaviors will vary wildly. Also deferring to driver exposes a possible way for a tenant to utilize features that may not be supported by the operator curated flavor. From user perspective flavor description (as interim) is sufficient to show what is supported by drivers behind the flavor. Supported extensions are critical component for this. Also, I think that turning extensions on/off is a bit of side problem to a service specification, so let's resolve it separately. Thanks, Eugene. On Fri, Jul 4, 2014 at 3:07 AM, Eichberger, German
Re: [openstack-dev] [neutron] Flavor framework: Conclusion
Hi, Mark and me has spent some time today discussing existing proposals and I think we got to a consensus. Initially I had two concerns about Mark's proposal which are - extension list attribute on the flavor - driver entry point on the service profile The first idea (ext list) need to be clarified more as we get more drivers that needs it. Right now we have FWaaS/VPNaaS which don't have extensions at all and we have LBaaS where all drivers support all extensions. So extension list can be postponed until we clarify how exactly we want this to be exposed to the user and how we want it to function on implementation side. Driver entry point which implies dynamic loading per admin's request is a important discussion point (at least, previously this idea received negative opinions from some cores) We'll implement service profiles, but this exact aspect of how driver is specified/loadede will be discussed futher. So based on that I'm going to start implementing this. I think that implementation result will allow us to develop in different directions (extension list vs tags, dynamic loading and such) depending on more information about how this is utilized by deployers and users. Thanks, Eugene. On Thu, Jul 3, 2014 at 5:57 PM, Susanne Balle sleipnir...@gmail.com wrote: +1 On Wed, Jul 2, 2014 at 10:12 PM, Kyle Mestery mest...@noironetworks.com wrote: We're coming down to the wire here with regards to Neutron BPs in Juno, and I wanted to bring up the topic of the flavor framework BP. This is a critical BP for things like LBaaS, FWaaS, etc. We need this work to land in Juno, as these other work items are dependent on it. There are still two proposals [1] [2], and after the meeting last week [3] it appeared we were close to conclusion on this. I now see a bunch of comments on both proposals. I'm going to again suggest we spend some time discussing this at the Neutron meeting on Monday to come to a closure on this. I think we're close. I'd like to ask Mark and Eugene to both look at the latest comments, hopefully address them before the meeting, and then we can move forward with this work for Juno. Thanks for all the work by all involved on this feature! I think we're close and I hope we can close on it Monday at the Neutron meeting! Kyle [1] https://review.openstack.org/#/c/90070/ [2] https://review.openstack.org/102723 [3] http://eavesdrop.openstack.org/meetings/networking_advanced_services/2014/networking_advanced_services.2014-06-27-17.30.log.html ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Layer7 Switching - L7 Rule - comapre_type values
I also don't think it is fair for certain drivers to hold other drivers hostage For some time there was a policy (openstack-wide) that public API should have a free open source implementation. In this sense open source driver may hold other drivers as hostages. Eugene. On Thu, Jul 3, 2014 at 10:37 PM, Jorge Miramontes jorge.miramon...@rackspace.com wrote: I agree. Also, since we are planning on having two different API versions run in parallel the only driver that needs to be worked on initially is the reference implementation. I'm guessing we will have two reference implementations, one for v1 and one for v2. The v2 implementation currently seems to be modified from v1 in order to get the highest velocity in terms of exposing API functionality. There is a reason we aren't working on Octavia right now and I think the same rationale holds for other drivers. So, I believe we should expose as much functionality possible with a functional open-source driver and then other drivers will catch up. As for drivers that can't implement certain features the only potential issue I see is a type of vendor lock-in. For example, let's say I am an operator agnostic power API user. I host with operator A and they use a driver that implements all functionality exposed via the API. Now, let's say I want to move to operator B because operator A isn't working for me. Let's also say that operator B doesn't implement all functionality exposed via the API. From the user's perspective they are locked out of going to operator B because their API integrated code won't port seamlessly. With this example in mind, however, I also don't think it is fair for certain drivers to hold other drivers hostage. From my perspective, if users really want a feature then every driver implementor should have the incentive to implement said feature and will benefit them in the long run. Anyways, that my $0.02. Cheers, --Jorge From: Stephen Balukoff sbaluk...@bluebox.net Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Date: Tuesday, June 24, 2014 7:30 PM To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Neutron][LBaaS] Layer7 Switching - L7 Rule - comapre_type values Making sure all drivers support the features offered in Neutron LBaaS means we are stuck going with the 'least common denominator' in all cases. While this ensures all vendors implement the same things in the functionally the same way, it also is probably a big reason the Neutron LBaaS project has been so incredibly slow in seeing new features added over the last two years. In the gerrit review that Dustin linked, it sounds like the people contributing to the discussion are in favor of allowing drivers to reject some configurations as unsupported through use of exceptions (details on how that will work is being hashed out now if you want to participate in that discussion). Let's assume, therefore, that with the LBaaS v2 API and Object model we're also going to get this ability-- which of course also means that drivers do not have to support every feature exposed by the API. (And again, as Dustin pointed out, a Linux LVS-based driver definitely wouldn't be able to support any L7 features at all, yet it's still a very useful driver for many deployments.) Finally, I do not believe that the LBaaS project should be held back because one vendor's implementation doesn't work well with a couple features exposed in the API. As Dustin said, let the API expose a rich feature set and allow drivers to reject certain configurations when they don't support them. Stephen On Tue, Jun 24, 2014 at 9:09 AM, Dustin Lundquist dus...@null-ptr.net wrote: I brought this up on https://review.openstack.org/#/c/101084/. -Dustin On Tue, Jun 24, 2014 at 7:57 AM, Avishay Balderman avish...@radware.com wrote: Hi Dustin I agree with the concept you described but as far as I understand it is not currently supported in Neutron. So a driver should be fully compatible with the interface it implements. Avishay *From:* Dustin Lundquist [mailto:dus...@null-ptr.net] *Sent:* Tuesday, June 24, 2014 5:41 PM *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [Neutron][LBaaS] Layer7 Switching - L7 Rule - comapre_type values I think the API should provide an richly featured interface, and individual drivers should indicate if they support the provided configuration. For example there is a spec for a Linux LVS LBaaS driver, this driver would not support TLS termination or any layer 7 features, but would still be valuable for some deployments. The user experience of such a solution could be improved if the driver to propagate up a message specifically identifying the unsupported feature. -Dustin On Tue, Jun 24, 2014
Re: [openstack-dev] [neutron] Flavor framework: Conclusion
German, First of all extension list looks lbaas-centric right now. Secondly, TLS and L7 are such APIs which objects should not require loadbalancer or flavor to be created (like pool or healthmonitor that are pure db objects). Only when you associate those objects with loadbalancer (or its child objects), driver may tell if it supports them. Which means that you can't really turn those on or off, it's a generic API. From user perspective flavor description (as interim) is sufficient to show what is supported by drivers behind the flavor. Also, I think that turning extensions on/off is a bit of side problem to a service specification, so let's resolve it separately. Thanks, Eugene. On Fri, Jul 4, 2014 at 3:07 AM, Eichberger, German german.eichber...@hp.com wrote: I am actually a bit bummed that Extensions are postponed. In LBaaS we are working hard on L7 and TLS extensions which we (the operators) like to switch on and off with different flavors... German -Original Message- From: Kyle Mestery [mailto:mest...@noironetworks.com] Sent: Thursday, July 03, 2014 2:00 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [neutron] Flavor framework: Conclusion Awesome, thanks for working on this Eugene and Mark! I'll still leave an item on Monday's meeting agenda to discuss this, hopefully it can be brief. Thanks, Kyle On Thu, Jul 3, 2014 at 10:32 AM, Eugene Nikanorov enikano...@mirantis.com wrote: Hi, Mark and me has spent some time today discussing existing proposals and I think we got to a consensus. Initially I had two concerns about Mark's proposal which are - extension list attribute on the flavor - driver entry point on the service profile The first idea (ext list) need to be clarified more as we get more drivers that needs it. Right now we have FWaaS/VPNaaS which don't have extensions at all and we have LBaaS where all drivers support all extensions. So extension list can be postponed until we clarify how exactly we want this to be exposed to the user and how we want it to function on implementation side. Driver entry point which implies dynamic loading per admin's request is a important discussion point (at least, previously this idea received negative opinions from some cores) We'll implement service profiles, but this exact aspect of how driver is specified/loadede will be discussed futher. So based on that I'm going to start implementing this. I think that implementation result will allow us to develop in different directions (extension list vs tags, dynamic loading and such) depending on more information about how this is utilized by deployers and users. Thanks, Eugene. On Thu, Jul 3, 2014 at 5:57 PM, Susanne Balle sleipnir...@gmail.com wrote: +1 On Wed, Jul 2, 2014 at 10:12 PM, Kyle Mestery mest...@noironetworks.com wrote: We're coming down to the wire here with regards to Neutron BPs in Juno, and I wanted to bring up the topic of the flavor framework BP. This is a critical BP for things like LBaaS, FWaaS, etc. We need this work to land in Juno, as these other work items are dependent on it. There are still two proposals [1] [2], and after the meeting last week [3] it appeared we were close to conclusion on this. I now see a bunch of comments on both proposals. I'm going to again suggest we spend some time discussing this at the Neutron meeting on Monday to come to a closure on this. I think we're close. I'd like to ask Mark and Eugene to both look at the latest comments, hopefully address them before the meeting, and then we can move forward with this work for Juno. Thanks for all the work by all involved on this feature! I think we're close and I hope we can close on it Monday at the Neutron meeting! Kyle [1] https://review.openstack.org/#/c/90070/ [2] https://review.openstack.org/102723 [3] http://eavesdrop.openstack.org/meetings/networking_advanced_services /2014/networking_advanced_services.2014-06-27-17.30.log.html ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org
Re: [openstack-dev] [Neutron][LBaaS] Which entities need status
Hi lbaas folks, IMO a status is really an important part of the API. In some old email threads Sam has proposed the solution for lbaas objects: we need to have several attributes that independently represent different types of statuses: - admin_state_up - operational status - provisioning state Not every status need to be on every object. Pure-DB objects (like pool) should not have provisioning state and operational status, instead, an association object should have them. I think that resolves both questions (1) and (2). If some object is shareable, then we'll have association object anyway, and that's where provisioning status and operationl status can reside. For sure it's not very simple, but this is the right way to do it. Also I'd like to emphasize that statuses are really an API thing, not a driver thing, so they must be used similarly across all drivers. Thanks, Eugene. On Tue, Jun 24, 2014 at 10:53 PM, Doug Wiegley do...@a10networks.com wrote: Hi Brandon, I think just one status is overloading too much onto the LB object (which is perhaps something that a UI should do for a user, but not something an API should be doing.) 1) If an entity exists without a link to a load balancer it is purely just a database entry, so it would always be ACTIVE, but not really active in a technical sense. Depends on the driver. I don¹t think this is a decision for lbaas proper. 2) If some of these entities become shareable then how does the status reflect that the entity failed to create on one load balancer but was successfully created on another. That logic could get overly complex. That¹s a status on the join link, not the object, and I could argue multiple ways in which that should be one way or another based on the backend, which to me, again implies driver question (backend could queue for later, or error immediately, or let things run degraded, orŠ) Thanks, Doug On 6/24/14, 11:23 AM, Brandon Logan brandon.lo...@rackspace.com wrote: I think we missed this discussion at the meet-up but I'd like to bring it up here. To me having a status on all entities doesn't make much sense, and justing having a status on a load balancer (which would be a provisioning status) and a status on a member (which would be an operational status) are what makes sense because: 1) If an entity exists without a link to a load balancer it is purely just a database entry, so it would always be ACTIVE, but not really active in a technical sense. 2) If some of these entities become shareable then how does the status reflect that the entity failed to create on one load balancer but was successfully created on another. That logic could get overly complex. I think the best thing to do is to have the load balancer status reflect the provisioning status of all of its children. So if a health monitor is updated then the load balancer that health monitor is linked to would have its status changed to PENDING_UPDATE. Conversely, if a load balancer or any entities linked to it are changed and the load balancer's status is in a non-ACTIVE state then that update should not be allowed. Thoughts? Thanks, Brandon ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] One performance issue about VXLAN pool initiation
Mike, Thanks a lot for your response! Some comments: There’s some in-Python filtering following it which does not seem necessary; the alloc.vxlan_vni not in vxlan_vnis” phrase could just as well be a SQL “NOT IN” expression. There we have to do specific set intersection between configured ranges and existing allocation. That could be done in sql, but that certainly would lead to a huge sql query text as full vxlan range could consist of 16 millions of ids. The synchronize_session=“fetch” is certainly a huge part of the time spent here You've actually made a good point about synchronize_session=“fetch” which was obviously misused by me. It seems to save up to 40% of plain deleting time. I've fixed that and get some speedup with deletes for both mysql and postgress that reduced difference between chunked/non-chunked version: 50k vnis to add/deletePg adding vnisPg deleting vnisPg TotalMysql adding vnisMysql deleting vnisMysql totalnon-chunked sql221537151530chuked in 10020 1333141428 Results of chunked and non-chunked version look closer, but gap increases with vni range size (based on few tests of 150k vni range) So I'm going to fix chunked version that is on review now. If you think that the benefit doesn't worth complexity - please let me know. Thanks, Eugene. On Mon, Jun 9, 2014 at 1:33 AM, Mike Bayer mba...@redhat.com wrote: On Jun 7, 2014, at 4:38 PM, Eugene Nikanorov enikano...@mirantis.com wrote: Hi folks, There was a small discussion about the better way of doing sql operations for vni synchronization with the config. Initial proposal was to handle those in chunks. Carl also suggested to issue a single sql query. I've did some testing with my sql and postgress. I've tested the following scenario: vxlan range is changed from 5:15 to 0:10 and vice versa. That involves adding and deleting 5 vni in each test. Here are the numbers: 50k vnis to add/deletePg adding vnisPg deleting vnis Pg TotalMysql adding vnis Mysql deleting vnisMysql totalnon-chunked sql 232245 142034 chunked in 10020 173714 1731 I've done about 5 tries to get each number to minimize random floating factor (due to swaps, disc or cpu activity or other factors) That might be surprising that issuing multiple sql statements instead one big is little bit more efficient, so I would appreciate if someone could reproduce those numbers. Also I'd like to note that part of code that iterates over vnis fetched from db is taking 10 seconds both on mysql and postgress and is a part of deleting vnis numbers. In other words, difference between multiple DELETE sql statements and single one is even bigger (in percent) than these numbers show. The code which I used to test is here: http://paste.openstack.org/show/83298/ Right now the chunked version is commented out, so to switch between versions some lines should be commented and some - uncommented. I’ve taken a look at this, though I’m not at the point where I have things set up to run things like this within full context, and I don’t know that I have any definitive statements to make, but I do have some suggestions: 1. I do tend to chunk things a lot, selects, deletes, inserts, though the chunk size I work with is typically more like 1000, rather than 100. When chunking, we’re looking to select a size that doesn’t tend to overload the things that are receiving the data (query buffers, structures internal to both SQLAlchemy as well as the DBAPI and the relational database), but at the same time doesn’t lead to too much repetition on the Python side (where of course there’s a lot of slowness). 2. Specifically regarding “WHERE x IN (…..)”, I always chunk those. When we use IN with a list of values, we’re building an actual SQL string that becomes enormous. This puts strain on the database’s query engine that is not optimized for SQL strings that are hundreds of thousands of characters long, and on some backends this size is limited; on Oracle, there’s a limit of 1000 items. So I’d always chunk this kind of thing. 3. I’m not sure of the broader context of this code, but in fact placing a literal list of items in the IN in this case seems unnecessary; the “vmis_to_remove” list itself was just SELECTed two lines above. There’s some in-Python filtering following it which does not seem necessary; the alloc.vxlan_vni not in vxlan_vnis” phrase could just as well be a SQL “NOT IN” expression. Not sure if determination of the “.allocated” flag can be done in SQL, if that’s a plain column, then certainly.Again not sure if this is just an artifact of how the test is done here, but if the goal is to optimize this code for speed, doing a DELETE…WHERE .. IN (SELECT ..) is probably better. I see that the SELECT is using a lockmode, but it would seem that if just the rows we care to DELETE are inlined within the DELETE itself this wouldn’t be needed either. It’s likely that everything in #3 is pretty
Re: [openstack-dev] [Neutron] One performance issue about VXLAN pool initiation
Hi folks, There was a small discussion about the better way of doing sql operations for vni synchronization with the config. Initial proposal was to handle those in chunks. Carl also suggested to issue a single sql query. I've did some testing with my sql and postgress. I've tested the following scenario: vxlan range is changed from 5:15 to 0:10 and vice versa. That involves adding and deleting 5 vni in each test. Here are the numbers: 50k vnis to add/deletePg adding vnisPg deleting vnisPg TotalMysql adding vnisMysql deleting vnisMysql totalnon-chunked sql232245142034chunked in 100 201737141731 I've done about 5 tries to get each number to minimize random floating factor (due to swaps, disc or cpu activity or other factors) That might be surprising that issuing multiple sql statements instead one big is little bit more efficient, so I would appreciate if someone could reproduce those numbers. Also I'd like to note that part of code that iterates over vnis fetched from db is taking 10 seconds both on mysql and postgress and is a part of deleting vnis numbers. In other words, difference between multiple DELETE sql statements and single one is even bigger (in percent) than these numbers show. The code which I used to test is here: http://paste.openstack.org/show/83298/ Right now the chunked version is commented out, so to switch between versions some lines should be commented and some - uncommented. Thanks, Eugene. P.S. I'm also afraid that issuing one big sql statement (and it will be megabytes big) could lead to timeouts/deadlocks just because it will take too much time, how ever I'm not 100% sure about that, it's just my bare concern. On Thu, Jun 5, 2014 at 1:06 PM, Xurong Yang ido...@gmail.com wrote: great. I will do more test base on Eugene Nikanorov's modification. *Thanks,* 2014-06-05 11:01 GMT+08:00 Isaku Yamahata isaku.yamah...@gmail.com: Wow great. I think the same applies to gre type driver. so we should create similar one after vxlan case is resolved. thanks, On Thu, Jun 05, 2014 at 12:36:54AM +0400, Eugene Nikanorov enikano...@mirantis.com wrote: We hijacked the vxlan initialization performance thread with ipam! :) I've tried to address initial problem with some simple sqla stuff: https://review.openstack.org/97774 With sqlite it gives ~3x benefit over existing code in master. Need to do a little bit more testing with real backends to make sure parameters are optimal. Thanks, Eugene. On Thu, Jun 5, 2014 at 12:29 AM, Carl Baldwin c...@ecbaldwin.net wrote: Yes, memcached is a candidate that looks promising. First things first, though. I think we need the abstraction of an ipam interface merged. That will take some more discussion and work on its own. Carl On May 30, 2014 4:37 PM, Eugene Nikanorov enikano...@mirantis.com wrote: I was thinking it would be a separate process that would communicate over the RPC channel or something. memcached? Eugene. On Sat, May 31, 2014 at 2:27 AM, Carl Baldwin c...@ecbaldwin.net wrote: Eugene, That was part of the whole new set of complications that I dismissively waved my hands at. :) I was thinking it would be a separate process that would communicate over the RPC channel or something. More complications come when you think about making this process HA, etc. It would mean going over RPC to rabbit to get an allocation which would be slow. But the current implementation is slow. At least going over RPC is greenthread friendly where going to the database doesn't seem to be. Carl On Fri, May 30, 2014 at 2:56 PM, Eugene Nikanorov enikano...@mirantis.com wrote: Hi Carl, The idea of in-memory storage was discussed for similar problem, but might not work for multiple server deployment. Some hybrid approach though may be used, I think. Thanks, Eugene. On Fri, May 30, 2014 at 8:53 PM, Carl Baldwin c...@ecbaldwin.net wrote: This is very similar to IPAM... There is a space of possible ids or addresses that can grow very large. We need to track the allocation of individual ids or addresses from that space and be able to quickly come up with a new allocations and recycle old ones. I've had this in the back of my mind for a week or two now. A similar problem came up when the database would get populated with the entire free space worth of ip addresses to reflect the availability of all of the individual addresses. With a large space (like an ip4 /8 or practically any ip6 subnet) this would take a very long time or never finish. Neutron was a little smarter about this. It compressed availability in to availability ranges in a separate table. This solved the original problem but is not problem free. It turns out that writing database operations to manipulate both
Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas
If a user makes a change to a secret Can we just disable that by making LBaaS a separate user so it would store secrets under LBaaS 'fake' tenant id? Eugene. On Sun, Jun 8, 2014 at 7:29 AM, Jain, Vivek vivekj...@ebay.com wrote: +1 for #2. In addition, I think it would be nice if barbican maintains versioned data on updates. Which means consumer of barbican APIs can request for data from older version if needed. This can address concerns expressed by German. For example if certificates were updated on barbican but somehow update is not compatible with load balancer device, then lbaas API user gets an option to fall back to older working certificate. That will avoid downtime of lbaas managed applications. Thanks, Vivek On 6/6/14, 3:52 PM, Eichberger, German german.eichber...@hp.com wrote: Jorge + John, I am most concerned with a user changing his secret in barbican and then the LB trying to update and causing downtime. Some users like to control when the downtime occurs. For #1 it was suggested that once the event is delivered it would be up to a user to enable an auto-update flag. In the case of #2 I am a bit worried about error cases: e.g. uploading the certificates succeeds but registering the loadbalancer(s) fails. So using the barbican system for those warnings might not as fool proof as we are hoping. One thing I like about #2 over #1 is that it pushes a lot of the information to Barbican. I think a user would expect when he uploads a new certificate to Barbican that the system warns him right away about load balancers using the old cert. With #1 he might get an e-mails from LBaaS telling him things changed (and we helpfully updated all affected load balancers) -- which isn't as immediate as #2. If we implement an auto-update flag for #1 we can have both. User's who like #2 juts hit the flag. Then the discussion changes to what we should implement first and I agree with Jorge + John that this should likely be #2. German -Original Message- From: Jorge Miramontes [mailto:jorge.miramon...@rackspace.com] Sent: Friday, June 06, 2014 3:05 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas Hey John, Correct, I was envisioning that the Barbican request would not be affected, but rather, the GUI operator or API user could use the registration information to do so should they want to do so. Cheers, --Jorge On 6/6/14 4:53 PM, John Wood john.w...@rackspace.com wrote: Hello Jorge, Just noting that for option #2, it seems to me that the registration feature in Barbican would not be required for the first version of this integration effort, but we should create a blueprint for it nonetheless. As for your question about services not registering/unregistering, I don't see an issue as long as the presence or absence of registered services on a Container/Secret does not **block** actions from happening, but rather is information that can be used to warn clients through their processes. For example, Barbican would still delete a Container/Secret even if it had registered services. Does that all make sense though? Thanks, John From: Youcef Laribi [youcef.lar...@citrix.com] Sent: Friday, June 06, 2014 2:47 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas +1 for option 2. In addition as an additional safeguard, the LBaaS service could check with Barbican when failing to use an existing secret to see if the secret has changed (lazy detection). Youcef -Original Message- From: Jorge Miramontes [mailto:jorge.miramon...@rackspace.com] Sent: Friday, June 06, 2014 12:16 PM To: OpenStack Development Mailing List (not for usage questions) Subject: [openstack-dev] [Neutron][LBaaS] Barbican Neutron LBaaS Integration Ideas Hey everyone, Per our IRC discussion yesterday I'd like to continue the discussion on how Barbican and Neutron LBaaS will interact. There are currently two ideas in play and both will work. If you have another idea please free to add it so that we may evaluate all the options relative to each other. Here are the two current ideas: 1. Create an eventing system for Barbican that Neutron LBaaS (and other services) consumes to identify when to update/delete updated secrets from Barbican. For those that aren't up to date with the Neutron LBaaS API Revision, the project/tenant/user provides a secret (container?) id when enabling SSL/TLS functionality. * Example: If a user makes a change to a secret/container in Barbican then Neutron LBaaS will see an event and take the appropriate action. PROS: - Barbican is going to create an eventing system regardless so it will be supported. - Decisions are made on
Re: [openstack-dev] [Neutron] One performance issue about VXLAN pool initiation
Unit tests use sqlite db backend, so it might be much faster than production environment where DB is on different server. Eugene. On Wed, Jun 4, 2014 at 11:14 AM, Wang, Yalei yalei.w...@intel.com wrote: Hi, Xurong, How do you test it in postgresql? Do you use tox to do a unittest and get the result(1h)? /Yalei *From:* Xurong Yang [mailto:ido...@gmail.com] *Sent:* Thursday, May 29, 2014 6:01 PM *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* [openstack-dev] [Neutron] One performance issue about VXLAN pool initiation Hi, Folks, When we configure VXLAN range [1,16M], neutron-server service costs long time and cpu rate is very high(100%) when initiation. One test base on postgresql has been verified: more than 1h when VXLAN range is [1, 1M]. So, any good solution about this performance issue? Thanks, Xurong Yang ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Your suggestions in the BP
Hi Sam, Eugene, please comment on the migration process bellow. I think that closing down the status handling should be done in phase 1. I don't mind. If you're talking about provisioning status then such status (if we're still going to maintain it per each kind of object) goes to various associations: loadbalancer-listener, or loadbalancer-listener-pool, etc. Not a big deal of course, it was just my initial intent to limit phase #1 as much as possible. Missing to do so, will create tests and other depending workflows that assume the current status field, which will add a technology debt to this new code. I'd say it would depend on the strategy options you're suggestion below. As far as bw compatibility is concerned (if it's concerned at all), we have to support existing status field, so that would not be any additional debt. Migration and co-existence: I think that it would be better to have the new object model and API done in a way that does not break existing code, and then switch the old api to redirect to the new api. Basically this means creating another lbaas plugin, that expose existing lbaas api extension. I'm not sure how this can be done considering the difference between new proposed api and existing api. This might be done in one of the two ways bellow: 1. Rename all objects in the new api so you have a clear demarcation point. This might be sufficient. I'm not sure how this could be done, can you explain? I actually would consider changing the prefix to /v3/ to not to deal with any renamings, that would require some minor refactoring on extension framework side. 2. Copy the existing LBaaS extension and create a new-lbaas extension with new object names, then create a new old lbaas extension that has the old API but redirect to the new API I also don't fully understand this, please explain. Doing 2, can allow co-existence of old code with old drivers until new code with new drivers can take its place. New extension + new plugin, is that what you are suggesting? To me it would be the cleanest and the most simple way to execute the transition, but... i'm not sure it was a consensus on design session. Thanks, Eugene. Regards, -Sam. -Original Message- From: Brandon Logan [mailto:brandon.lo...@rackspace.com] Sent: Friday, May 30, 2014 6:38 PM To: Samuel Bercovici Subject: Your suggestions in the BP Hi Sam! Thanks for the suggestions. I don't think the different statuses should be addressed in the blueprint. I think it would be better left to have its own focus in its own blueprint. I feel the same way about the subnet_id. I think if this blueprint focuses just on the object model change and the migration to it, it would be better. As for having a v2 version of all the table or entirely new tables. Are you suggesting just keeping the old API going to the old object models and database tables? Also, if say I renamed the pool object to nodepool (I prefer that over group), then are you also suggesting the new API will have a /nodepools resource along with the object model NodePool and database table nodepool? I'm intrigued by this idea, but wasn't totally clear on what you were suggesting. Thanks, Brandon ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [All] Disabling Pushes of new Gerrit Draft Patchsets
Hi, I might be posting a question to a wrong thread, but what would be the option to push a patch that I would like to share only with certain group of people. In other words, is there still an option to push non-public patches? I wouldn't like such patches to affect gerrit stream or trigger CIs, but gerrit could still be used for regular reviewing process. Thanks, Eugene. On Sat, May 31, 2014 at 12:51 AM, Sergey Lukjanov slukja...@mirantis.com wrote: Yay! No more weird CR chains. On Fri, May 30, 2014 at 9:32 PM, Clark Boylan clark.boy...@gmail.com wrote: On Wed, May 21, 2014 at 4:24 PM, Clark Boylan clark.boy...@gmail.com wrote: Hello everyone, Gerrit has long supported Draft patchsets, and the infra team has long recommended against using them as they are a source of bugs and confusion (see below for specific details if you are curious). The newer version of Gerrit that we recently upgraded to allows us to prevent people from pushing new Draft patchsets. We will take advantage of this and disable pushes of new Drafts on Friday May 30, 2014. The impact of this change should be small. You can use the Work in Progress state instead of Drafts for new patchsets. Any existing Draft patchsets will remain in a Draft state until it is published. Now for the fun details on why drafts are broken. * Drafts appear to be secure but they offer no security. This is bad for user expectations and may expose data that shouldn't be exposed. * Draft patchsets pushed after published patchsets confuse reviewers as they cannot vote with a value because the latest patchset is hidden. * Draft patchsets confuse the Gerrit event stream output making it difficult for automated tooling to do the correct thing with Drafts. * Child changes of Drafts will fail to merge without explanation. Let us know if you have any questions, Clark (on behalf of the infra team) Heads up everyone, this is now in effect and pushes of new draft patchsets have been disabled. Thanks, Clark ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Sincerely yours, Sergey Lukjanov Sahara Technical Lead (OpenStack Data Processing) Principal Software Engineer Mirantis Inc. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] One performance issue about VXLAN pool initiation
Hi Carl, The idea of in-memory storage was discussed for similar problem, but might not work for multiple server deployment. Some hybrid approach though may be used, I think. Thanks, Eugene. On Fri, May 30, 2014 at 8:53 PM, Carl Baldwin c...@ecbaldwin.net wrote: This is very similar to IPAM... There is a space of possible ids or addresses that can grow very large. We need to track the allocation of individual ids or addresses from that space and be able to quickly come up with a new allocations and recycle old ones. I've had this in the back of my mind for a week or two now. A similar problem came up when the database would get populated with the entire free space worth of ip addresses to reflect the availability of all of the individual addresses. With a large space (like an ip4 /8 or practically any ip6 subnet) this would take a very long time or never finish. Neutron was a little smarter about this. It compressed availability in to availability ranges in a separate table. This solved the original problem but is not problem free. It turns out that writing database operations to manipulate both the allocations table and the availability table atomically is very difficult and ends up being very slow and has caused us some grief. The free space also gets fragmented which degrades performance. This is what led me -- somewhat reluctantly -- to change how IPs get recycled back in to the free pool which hasn't been very popular. I wonder if we can discuss a good pattern for handling allocations where the free space can grow very large. We could use the pattern for the allocation of both IP addresses, VXlan ids, and other similar resource spaces. For IPAM, I have been entertaining the idea of creating an allocation agent that would manage the availability of IPs in memory rather than in the database. I hesitate, because that brings up a whole new set of complications. I'm sure there are other potential solutions that I haven't yet considered. The L3 subteam is currently working on a pluggable IPAM model. Once the initial framework for this is done, we can more easily play around with changing the underlying IPAM implementation. Thoughts? Carl On Thu, May 29, 2014 at 4:01 AM, Xurong Yang ido...@gmail.com wrote: Hi, Folks, When we configure VXLAN range [1,16M], neutron-server service costs long time and cpu rate is very high(100%) when initiation. One test base on postgresql has been verified: more than 1h when VXLAN range is [1, 1M]. So, any good solution about this performance issue? Thanks, Xurong Yang ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] One performance issue about VXLAN pool initiation
I was thinking it would be a separate process that would communicate over the RPC channel or something. memcached? Eugene. On Sat, May 31, 2014 at 2:27 AM, Carl Baldwin c...@ecbaldwin.net wrote: Eugene, That was part of the whole new set of complications that I dismissively waved my hands at. :) I was thinking it would be a separate process that would communicate over the RPC channel or something. More complications come when you think about making this process HA, etc. It would mean going over RPC to rabbit to get an allocation which would be slow. But the current implementation is slow. At least going over RPC is greenthread friendly where going to the database doesn't seem to be. Carl On Fri, May 30, 2014 at 2:56 PM, Eugene Nikanorov enikano...@mirantis.com wrote: Hi Carl, The idea of in-memory storage was discussed for similar problem, but might not work for multiple server deployment. Some hybrid approach though may be used, I think. Thanks, Eugene. On Fri, May 30, 2014 at 8:53 PM, Carl Baldwin c...@ecbaldwin.net wrote: This is very similar to IPAM... There is a space of possible ids or addresses that can grow very large. We need to track the allocation of individual ids or addresses from that space and be able to quickly come up with a new allocations and recycle old ones. I've had this in the back of my mind for a week or two now. A similar problem came up when the database would get populated with the entire free space worth of ip addresses to reflect the availability of all of the individual addresses. With a large space (like an ip4 /8 or practically any ip6 subnet) this would take a very long time or never finish. Neutron was a little smarter about this. It compressed availability in to availability ranges in a separate table. This solved the original problem but is not problem free. It turns out that writing database operations to manipulate both the allocations table and the availability table atomically is very difficult and ends up being very slow and has caused us some grief. The free space also gets fragmented which degrades performance. This is what led me -- somewhat reluctantly -- to change how IPs get recycled back in to the free pool which hasn't been very popular. I wonder if we can discuss a good pattern for handling allocations where the free space can grow very large. We could use the pattern for the allocation of both IP addresses, VXlan ids, and other similar resource spaces. For IPAM, I have been entertaining the idea of creating an allocation agent that would manage the availability of IPs in memory rather than in the database. I hesitate, because that brings up a whole new set of complications. I'm sure there are other potential solutions that I haven't yet considered. The L3 subteam is currently working on a pluggable IPAM model. Once the initial framework for this is done, we can more easily play around with changing the underlying IPAM implementation. Thoughts? Carl On Thu, May 29, 2014 at 4:01 AM, Xurong Yang ido...@gmail.com wrote: Hi, Folks, When we configure VXLAN range [1,16M], neutron-server service costs long time and cpu rate is very high(100%) when initiation. One test base on postgresql has been verified: more than 1h when VXLAN range is [1, 1M]. So, any good solution about this performance issue? Thanks, Xurong Yang ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron][Flavors] Flavor framework implementation questions
Hi, I have two questions that previously were briefly discussed. Both of them still cause discussions within advanced services community, so I'd like to make final clarification in this email thread. 1. Usage of Service Type Framework I think right now there is a consensus that letting user to choose a vendor is not what neutron should do. To be more specific, having public 'provider' attribute on a resource may create certain problems for operators because it binds resource and implementation too tightly that it can't be maintained without changing user configuration. The question that was discussed is if it's ok to remove this public attribute and still use the rest of framework? I think the answer is yes: the binding between resource and implementing driver is ok if it's read-only and visible only to admin. This still serves as hint for dispatching requests to driver and also may help operator to troubleshoot issues. I'd like to hear other opinions on this because there are patches for vpn and fwaas that use STF with the difference described above. 2. Choosing implementation and backend This topic was briefly touched at design session dedicated to flavors. Current proposal that was discussed on advanced services meetings was to employ 2-step scheduling as described in the following sample code: https://review.openstack.org/#/c/83055/7/neutron/tests/unit/test_flavors.pyline 38-56 In my opinion the proposed way has the following benefits: - it delegates actual backend choosing to a driver. This way different vendors may not be required to agree on how to bind resource to a backend or any kind of other common implementation stuff that usually leads to lots of discussions. - allows different configurable vendor-specific algorithms to be used when binding resource to a backend - some drivers don't have notion of a backend because external entities manage backends for them Another proposal is to have single-step scheduling where each driver exposes the list of backends and then scheduling just chooses the backend based on capabilities in the flavor. I'd like to better understand the benefits of the second approach (this is all implementation so from user standpoint it doesn't make difference) So please add your opinions on those questions. My suggestion is also to have a short 30 min meeting sometime this or next week to finalize those questions. There are several patches and blueprints that depend on them. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron][LBaaS] Weekly subteam meeting Thursday, 14-00 UTC
Hi folks, Let's keep our regular meetings on Thursday, 14-00 UTC The agenda for the meeting: https://wiki.openstack.org/wiki/Network/LBaaS#Agenda Sorry for the short notice, I'm still catching up with everything. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] Supporting retries in neutronclient
In fact, nova should be careful about changing number of retries for neutron client. It's known that under significant load (people test serial VM creation) neutron client may timeout on POST operation which does port creation; retrying this again leads to multiple fixed IPs assigned to a VM Thanks, Eugene. On Wed, May 28, 2014 at 12:09 AM, Kyle Mestery mest...@noironetworks.comwrote: I'm not aware of any such change at the moment, no. On Tue, May 27, 2014 at 3:06 PM, Paul Ward wpw...@us.ibm.com wrote: Great! Do you know if there's any corresponding nova changes to support this as a conf option that gets passed in to this new parm? Kyle Mestery mest...@noironetworks.com wrote on 05/27/2014 01:56:12 PM: From: Kyle Mestery mest...@noironetworks.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 05/27/2014 02:00 PM Subject: Re: [openstack-dev] [neutron] Supporting retries in neutronclient On Tue, May 27, 2014 at 12:48 PM, Paul Ward wpw...@us.ibm.com wrote: Currently, neutronclient is hardcoded to only try a request once in retry_request by virtue of the fact that it uses self.retries as the retry count, and that's initialized to 0 and never changed. We've seen an issue where we get an ssl handshaking error intermittently (seems like more of an ssl bug) and a retry would probably have worked. Yet, since neutronclient only tries once and gives up, it fails the entire operation. Here is the code in question: https://github.com/openstack/python-neutronclient/blob/master/ neutronclient/v2_0/client.py#L1296 Does anybody know if there's some explicit reason we don't currently allow configuring the number of retries? If not, I'm inclined to propose a change for just that. There is a review to address this in place now [1]. I've given a -1 due to a trivial reason which I hope Jakub will address soon so we can land this patch in the client code. Thanks, Kyle [1] https://review.openstack.org/#/c/90104/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Unanswered questions in object model refactor blueprint
Hi folks, From lbaas design session I have an impression that we need to have a kind of merge of existing API and new proposed API. E.g. the blueprint implementation should not be brand new API and DB backend. So I'd say from implementation perspective we may want to minimize amount of changes for the patch that will bring existing model to a new design. Such things as relationships M:N instead of 1:N are usually complicate things a lot, so I'd probably consider to implement them in following patches. Changing 1:N to 1:1 (pools - healthmons) falls into another category: we need to deprecate this rather than remove at once. Same applies for status/status_description: we need to deprecate those. Thanks, Eugene. On Wed, May 28, 2014 at 7:36 AM, Doug Wiegley do...@a10networks.com wrote: Hi Stephen, Doug: What do you think of the idea of having both IPv4 and IPv6 attributes on a 'load balancer' object? One doesn't need to have a single appliance serving both types of addresses for the listener, but there's certainly a chance (albeit small) to hit an async scenario if they're not. I made the same suggestion in the bp review , so I have no issue with it. :-) I can think of at least one backend that pairs ip:port in their listener object, so the driver for that can implement the v4/v6 model natively with no issue (but not a true N:N). For the ones that don’t, at least the race condition outlined earlier is limited to two objects, which is ok if the use case is common enough to warrant it. I’d still prefer that issue be handled as far up the chain as possible, but I’d be ok. Doug From: Stephen Balukoff sbaluk...@bluebox.net Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Date: Tuesday, May 27, 2014 at 8:42 PM To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Neutron][LBaaS] Unanswered questions in object model refactor blueprint Hi y'all! On Tue, May 27, 2014 at 12:32 PM, Brandon Logan brandon.lo...@rackspace.com wrote: Referencing this blueprint: https://review.openstack.org/#/c/89903/5/specs/juno/lbaas-api-and-objmodel-improvement.rst Anyone who has suggestions to possible issues or can answer some of these questions please respond. 1. LoadBalancer to Listener relationship M:N vs 1:N The main reason we went with the M:N was so IPv6 could use the same listener as IPv4. However this can be accomplished by the user just creating a second listener and pool with the same configuration. This will end up being a bad user experience when the listener and pool configuration starts getting complex (adding in TLS, health monitors, SNI, etc). A good reason to not do the M:N is because the logic on might get complex when dealing with status. I'd like to get people's opinions on this on whether we should do M:N or just 1:N. Another option, is to just implement 1:N right now and later implement the M:N in another blueprint if it is decided that the user experience suffers greatly. My opinion: I like the idea of leaving it to another blueprint to implement. However, we would need to watch out for any major architecture changes in the time itis not implemented that could make this more difficult than what it needs to be. Is there such a thing as a 'possibly planned but not implemented design' to serve as a placeholder when considering other in-parallel blueprints and designs which could potentially conflict with the ability to implement an anticipated design like this? (I'm guessing no. I really wish we had a better design tracking tool than blueprint.) Anyway, I don't have a problem with implementing 1:N right now. But, I do want to point out: The one and only common case I've seen where listener re-use actually makes a lot of sense (IPv4 and IPv6 for same listener) could be alleviated by adding separate ipv4 and ipv6 attributes to the loadbalancer object. I believe this was shot down when people were still calling it a VIP for philosophical reasons. Are people more open to this idea now that we're calling the object a 'load balancer'? ;) Does anyone have any other use cases where listener re-use makes sense? 2. Pool to Health Monitor relationship 1:N vs 1:1 Currently, I believe this is 1:N however it was suggested to deprecate this in favor of 1:1 by Susanne and Kyle agreed. Are there any objections to channging to 1:1? My opinion: I'm for 1:1 as long as there aren't any major reasons why there needs to be 1:N. Yep, totally on-board with 1:1 for pool and health monitor. 3. Does the Pool object need a status field now that it is a pure logical object? My opinion: I don't think it needs the status field. I think the LoadBalancer object may be the only thing that needs a status, other than the pool members for health monitoring. I might be corrected on this though.
Re: [openstack-dev] [Neutron][LBaaS] Updated Object Model?
Hi folks, Agree with Kyle, you may go ahead and update the spec on review to reflect the design discussed at the summit. Thanks, Eugene. On Tue, May 20, 2014 at 6:07 PM, Kyle Mestery mest...@noironetworks.comwrote: On Mon, May 19, 2014 at 9:28 PM, Brandon Logan brandon.lo...@rackspace.com wrote: In the API meeting at the summit, Mark McClain mentioned that the existing API should be supported, but deprecated so as not to interrupt those using the existing API. To me, that sounds like the object model can change but there needs to be some kind of adapter/translation layer that modifies any existing current API calls to the new object model. So currently there is this blueprint spec that Eugene submitted: https://review.openstack.org/#/c/89903/3/specs/juno/lbaas-api-and-objmodel-improvement.rst That is implementing the object model with VIP as root object. I suppose this needs to be changed to have the changed we agreed on at the summit. Also, this blueprint should also cover the layer in which to the existing API calls get mapped to this object model. My question is to anyone who knows for certain: should this blueprint just be changed to reflect the new object model agreed on at the summit or should a new blueprint spec be created? If it should just be changed should it wait until Eugene gets back from vacation since he's the one who created this blueprint spec? If you think it makes sense to change this existing document, I would say we should update Eugene's spec mentioned above to reflect what was agreed upon at the summit. I know Eugene is on vacation this week, so in this case it may be ok for you to push a new revision of his specification while he's out, updating it to reflect the object model changes. This way we can make some quick progress on this front. We won't approve this until he gets back and has a chance to review it. Let me know if you need help in pulling this spec down and pushing a new version. Thanks, Kyle After that, then the API change blueprint spec should be created that adds the /loadbalancers resource and other changes. If anyone else can add anything please do. If I said anything wrong please correct me, and if anyone can answer my question above please do. Thanks, Brandon Logan On Mon, 2014-05-19 at 17:06 -0400, Susanne Balle wrote: Great summit!! fantastic to meeting you all in person. We now have agreement on the Object model. How do we turn that into blueprints and also how do we start making progress on the rest of the items we agree upon at the summit? Susanne On Fri, May 16, 2014 at 2:07 AM, Brandon Logan brandon.lo...@rackspace.com wrote: Yeah that’s a good point. Thanks! From: Eugene Nikanorov enikano...@mirantis.com Reply-To: openstack-dev@lists.openstack.org openstack-dev@lists.openstack.org Date: Thursday, May 15, 2014 at 10:38 PM To: openstack-dev@lists.openstack.org openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Neutron][LBaaS] Updated Object Model? Brandon, It's allowed right now just per API. It's up to a backend to decide the status of a node in case some monitors find it dead. Thanks, Eugene. On Fri, May 16, 2014 at 4:41 AM, Brandon Logan brandon.lo...@rackspace.com wrote: I have concerns about multiple health monitors on the same pool. Is this always going to be the same type of health monitor? There’s also ambiguity in the case where one health monitor fails and another doesn’t. Is it an AND or OR that determines whether the member is down or not? Thanks, Brandon Logan From: Eugene Nikanorov enikano...@mirantis.com Reply-To: openstack-dev@lists.openstack.org openstack-dev@lists.openstack.org Date: Thursday, May 15, 2014 at 9:55 AM To: openstack-dev@lists.openstack.org openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Neutron][LBaaS] Updated Object Model? Vijay, Pools-monitors are still many to many, if it's not so on the picture - we'll fix that. I brought this up as an example of how we dealt with m:n via API. Thanks, Eugene. On Thu, May 15, 2014 at 6:43 PM, Vijay Venkatachalam vijay.venkatacha...@citrix.com wrote: Thanks for the clarification. Eugene. A tangential point since you brought
Re: [openstack-dev] [Neutron][LBaaS] Updated Object Model?
Hi Craig, Implementation-specific options are not exposed through the API, or otherwise it would be inconsistent, given that we are moving to a flavor-based approach of specifying service requirements where implementation is completely hidden from the user. Thanks, Eugene. On Thu, May 15, 2014 at 3:24 AM, IWAMOTO Toshihiro iwam...@valinux.co.jpwrote: At Wed, 14 May 2014 10:18:29 -0700, Stephen Balukoff wrote: [1 multipart/alternative (7bit)] [1.1 text/plain; UTF-8 (7bit)] Hi Craig, I'm attaching the latest object model diagram as discussed with the RAX team last night, Samuel and Eugene. Note that we don't necessarily have HP's blessing on this model yet, nor do we have Neutron core developer buy in. But in any case, here it is. Sorry for jumping into a meta-argument, but what the LBaaS community needs to do first is to figure out how to make the neutron community agree on a LBaaS proposal. On the other hand, the neutron community (or the core?) needs to make it clear that what criteria or process is required to approve some LBaaS idea (obj. model, API, whatsoever). I'm sorry to say (again) that not much people have energy to follow and check out small differences in those large pictures of LBaaS object models. -- IWAMOTO Toshihiro ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Updated Object Model?
Hi Vijay, When you say API is not available, it means this should not be considered like a resource/entity. Correct? But then, there would be API like a bind API, that accepts loadbalancer_id listener_id, which creates this object. And also, there would be an API that will be used to list the listeners of a LoadBalancer. Right? Right, that's the same as health monitors and pools work right now: there are separate REST action to associate healthmon to a pool Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Updated Object Model?
Hi Iwamoto, I think you may want to talk to Mark MacClain on why we want to move to flavors instead of letting user to chose implementation. Basically the arguments against flavors (flexible mapping) are based on some lack of understanding of cloud operator requirements. Ability to chose provider was the first simplistic step in allowing multivendor support, but it appeared to be not much convenient for cloud operators. And obviously implementation details (especially vendor-specific) should be hidden behind API, that's the goal of all OS services to abstract tenant from that. Thanks, Eugene. On Thu, May 15, 2014 at 6:45 PM, IWAMOTO Toshihiro iwam...@valinux.co.jpwrote: It is pity to see enoumous LBaaS efforts have been largely spinning (in a sense of spinlocks) for a while. At Thu, 15 May 2014 14:31:54 +0400, Eugene Nikanorov wrote: [1 multipart/alternative (7bit)] [1.1 text/plain; UTF-8 (7bit)] Hi Craig, Implementation-specific options are not exposed through the API, or otherwise it would be inconsistent, given that we are moving to a flavor-based approach of specifying service requirements where implementation is completely hidden from the user. There was a lot of arguments against your flavor proposal at the design summit session and at the meeting at the pod area. So, it is not clear if moving to a flavor-based happens in a reasonalbe timeframe. OTOH, the flavor framework is not much more than a bitmap matching of feature vectors. I think something is not going right as we spent good 30mins on this topic at the pod area. We'll be able to continue the same technical level argument at home as we did for the last couple of months. My suggestion is to try to discuss differently here at the summit. -- IWAMOTO Toshihiro ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Updated Object Model?
Vijay, Pools-monitors are still many to many, if it's not so on the picture - we'll fix that. I brought this up as an example of how we dealt with m:n via API. Thanks, Eugene. On Thu, May 15, 2014 at 6:43 PM, Vijay Venkatachalam vijay.venkatacha...@citrix.com wrote: Thanks for the clarification. Eugene. A tangential point since you brought healthmon and pool. There will be an additional entity called ‘PoolMonitorAssociation’ which results in a many to many relationship between pool and monitors. Right? Now, the model is indicating a pool can have only one monitor. So a minor correction is required to indicate the many to many relationship via PoolMonitorAssociation. Thanks, Vijay V. *From:* Eugene Nikanorov [mailto:enikano...@mirantis.com] *Sent:* Thursday, May 15, 2014 7:36 PM *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [Neutron][LBaaS] Updated Object Model? Hi Vijay, When you say API is not available, it means this should not be considered like a resource/entity. Correct? But then, there would be API like a bind API, that accepts loadbalancer_id listener_id, which creates this object. And also, there would be an API that will be used to list the listeners of a LoadBalancer. Right? Right, that's the same as health monitors and pools work right now: there are separate REST action to associate healthmon to a pool Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Updated Object Model?
Brandon, It's allowed right now just per API. It's up to a backend to decide the status of a node in case some monitors find it dead. Thanks, Eugene. On Fri, May 16, 2014 at 4:41 AM, Brandon Logan brandon.lo...@rackspace.comwrote: I have concerns about multiple health monitors on the same pool. Is this always going to be the same type of health monitor? There’s also ambiguity in the case where one health monitor fails and another doesn’t. Is it an AND or OR that determines whether the member is down or not? Thanks, Brandon Logan From: Eugene Nikanorov enikano...@mirantis.com Reply-To: openstack-dev@lists.openstack.org openstack-dev@lists.openstack.org Date: Thursday, May 15, 2014 at 9:55 AM To: openstack-dev@lists.openstack.org openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Neutron][LBaaS] Updated Object Model? Vijay, Pools-monitors are still many to many, if it's not so on the picture - we'll fix that. I brought this up as an example of how we dealt with m:n via API. Thanks, Eugene. On Thu, May 15, 2014 at 6:43 PM, Vijay Venkatachalam vijay.venkatacha...@citrix.com wrote: Thanks for the clarification. Eugene. A tangential point since you brought healthmon and pool. There will be an additional entity called ‘PoolMonitorAssociation’ which results in a many to many relationship between pool and monitors. Right? Now, the model is indicating a pool can have only one monitor. So a minor correction is required to indicate the many to many relationship via PoolMonitorAssociation. Thanks, Vijay V. *From:* Eugene Nikanorov [mailto:enikano...@mirantis.com] *Sent:* Thursday, May 15, 2014 7:36 PM *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [Neutron][LBaaS] Updated Object Model? Hi Vijay, When you say API is not available, it means this should not be considered like a resource/entity. Correct? But then, there would be API like a bind API, that accepts loadbalancer_id listener_id, which creates this object. And also, there would be an API that will be used to list the listeners of a LoadBalancer. Right? Right, that's the same as health monitors and pools work right now: there are separate REST action to associate healthmon to a pool Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron][LBaaS] Cancelling weekly meeting 15 of May
Hi, Let's skip the meeting this week for obvious reasons :) Also, Oleg and me will not be able to attend the meeting next week (if it will be conducted), because we will be on vocation. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Meetup?
Hi, what time are you suggesting? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Meetup?
I'm going to attend the next nw session @b103, we can meet in between. Im in b103 too. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Multiple VIPs per loadbalancer
Hi Stephen, While in second approach VIP remains a keeper of L2/L3 information, while listeners keep L4+ information. That seems to be more clear. There's a complication though: Pools may also need some L2/L3 information (per the discussion of adding subnet_id as an attribute of the pool, eh.) Right, pool needs that, but I'm talking about frontend here. Obviously in case loadbalancer balances traffic over several pools that may be on different subnets - it may need to have l2 ports on each of them, just as you said below. And actually, there are a few cases that have been discussed where operators do want users to be able to have some (limited) control over the back end. These almost all have to do with VIP affinity. The need of that is understood, however, I think there are other options based on smarter scheduling and SLAs. Also we've heard objection to this approach several times from other core team members (this discussion has been going for more than half a year now), so I would suggest to move forward with single L2 port approach. Then the question goes down to terminology: loadbalancer/VIPs or VIP/Listeners. To be fair this is definitely about more than terminology. In the examples you've listed mentioning loadbalancer objects, it seems to me that you're ignoring that this model also still contains Listeners. So, to be more accurate, it's really about: loadbalancer/VIPs/Listeners or VIPs/Listeners. To me it seems that loadbalancer/VIPs/Listeners is only needed for multiple l2 enpoints, e.g.: container / n x L2 / m x L4+ In single L2 endpoint case (i'm, again, talking about the front end) If we say that VIP is L4+ only (tcp port, protocol, etc), then to properly handle multiple VIPs in this case, L2/L3 information should be stored in loadbalancer. To me, that says it's all about: Does the loadbalancer object add something meaningful to this model? And I think the answer is: * To smaller users with very basic load balancing needs: No (mostly, though to many it's still yes) Agree. * To larger customers with advanced load balancing needs: Yes. * To operators of any size: Yes. While operators may want to operate/monitor backends directly, that seems to be out of tenant API scope. We need to evaluate those 'advanced needs' for tenants and see if we can address that without making lbaas a proxy between user and LB appliance. I've outlined my reasoning for thinking so in the other discussion thread. The reasoning seems clear to me, I just suggest that there are other options that could help in supporting those advanced cases. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Multiple VIPs per loadbalancer
Hi Carlos, On May 9, 2014, at 3:36 PM, Eugene Nikanorov enikano...@mirantis.com wrote: Also we've heard objection to this approach several times from other core team members (this discussion has been going for more than half a year now), so I would suggest to move forward with single L2 port approach. Objections to multiple ports per loadbalancer or objections to the Loadbalancer object itself? If its the latter then you may have a valid argument by authority but its impossible to verify because these core team members are remaining silent through out all these discussions. We can't be dissuaded due to FUD(Fear, Uncertainty and Doub)t that these silent core team members will suddenly reject this discussion in the future. We aren't going to put our users at risk due to FUD. I think you had a chance to hear this argument yourself (from several different core members: Mark McClain, Salvatore Orlando, Kyle Mestery) on those meetings we had in past 2 months. I was advocating 'loadbalancer' (in it's extended version) once too, receiving negative opinions as well. In general this approach puts too much of control of a backend to user's hands and this goes in opposite direction than neutron project. If it's just about the name of the root object - VIP suits this role too, so I'm fine with that. I also find VIP/Listeners model a bit more clearer per definitions in our glossary. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] API proposal review thoughts
Hi Stephen, Some comments on comments on comments: On Fri, May 9, 2014 at 10:25 PM, Stephen Balukoff sbaluk...@bluebox.netwrote: Hi Eugene, This assumes that 'VIP' is an entity that can contain both an IPv4 address and an IPv6 address. This is how it is in the API proposal and corresponding object model that I suggested, but it is a slight re-definition of the term virtual IP as it's used in the rest of the industry. (And again, we're not yet in agreement that 'VIP' should actually contain two ip addresses like this.) That seems a minor issue to me. May be we can just introduce a statement that VIP has L2 endpoint first of all? In my mind, the main reasons I would like to see the container object are: - It solves the colocation / apolcation (or affinity / anti-affinity) problem for VIPs in a way that is much more intuitive to understand and less confusing for users than either the hints included in my API, or something based off the nova blueprint for doing the same for virtual servers/containers. (Full disclosure: There probably would still be a need for some anti-affinity logic at the logical load balancer level as well, though at this point it would be an operator concern only and expressed to the user in the flavor of the logical load balancer object, and probably be associated with different billing strategies. The user wants a dedicated physical load balancer? Then he should create one with this flavor, and note that it costs this much more...) In fact, that can be solved by scheduling, without letting user to control that. Flavor Framework will be able to address that. - From my experience, users are already familiar with the concept of what a logical load balancer actually is (ie. something that resembles a physical or virtual appliance from their perspective). So this probably fits into their view of the world better. That might be so, but apparently it goes in opposite direction than neutron in general (i.e. more abstraction) - It makes sense for Load Balancer as a Service to hand out logical load balancer objects. I think this will aid in a more intuitive understanding of the service for users who otherwise don't want to be concerned with operations. - This opens up the option for private cloud operators / providers to bill based on number of physical load balancers used (if the logical load balancer happens to coincide with physical load balancer appliances in their implementation) in a way that is going to be seen as more fair and more predictable to the user because the user has more control over it. And it seems to me this is accomplished without producing any undue burden on public cloud providers, those who don't bill this way, or those for whom the logical load balancer doesn't coincide with physical load balancer appliances. I don't see how 'loadbalancer' is better than 'VIP' here, other than being a bit closer term to 'logical loadbalancer'. - Attaching a flavor attribute to a logical load balancer seems like a better idea than attaching it to the VIP. What if the user wants to change the flavor on which their VIP is deployed (ie. without changing IP addresses)? What if they want to do this for several VIPs at once? I can definitely see this happening in our customer base through the lifecycle of many of our customers' applications. I don't see any problems with above cases if VIP is the root object - Having flavors associated with load balancers and not VIPs also allows for operators to provide a lot more differing product offerings to the user in a way that is simple for the user to understand. For example: - Flavor A is the cheap load balancer option, deployed on a shared platform used by many tenants that has fewer guarantees around performance and costs X. - Flavor B is guaranteed to be deployed on vendor Q's Super Special Product (tm) but to keep down costs, may be shared with other tenants, though not among a single tenant's load balancers unless the tenant uses the same load balancer id when deploying their VIPs (ie. user has control of affinity among their own VIPs, but no control over whether affinity happens with other tenants). It may experience variable performance as a result, but has higher guarantees than the above and costs a little more. - Flavor C is guaranteed to be deployed on vendor P's Even Better Super Special Product (tm) and is also guaranteed not to be shared among tenants. This is essentially the dedicated load balancer option that gets you the best guaranteed performance, but costs a lot more than the above. - ...and so on. Right, that's how flavors are supposed to work, but that's again unrelated to whether we make VIP or loadbalancer our root object.
Re: [openstack-dev] [Neutron][LBaaS] Multiple VIPs per loadbalancer
Hi Brandon, I, too, have not heard clear and concise reasons why the core team members would not like a logical load balancer object, or a load balancer object that maps to many vips, which in turn maps to many listeners. I've been to every LBaaS meeting for months now I think, and I just remember that you and others have said the core team members object to it, but not any clear reasons. Would it be possible for you to point us to an IRC chat log or a ML thread that does discuss that? Well, It seems to me that I understood that reason. The reason was formulated as 'networking functions, not virtualized appliances'. Loadbalancer object as a concept of virtual appliance (which Stephen and Carlos seem to be advocating) is not a kind of concept neutron tries to expose via its API. To me it looks like a valid argument. Yes, some users may expect that API will give them control of their super-dedicated LB appliance. Also, having API that looks like API of *typical* appliance may look more familiar to users who moved from operating physical lb appliance. But that's not the kind of ability neutron tries to allow, letting user to work with more abstracted concepts instead. A lot of operators have come into this project lately and most (if not all) would prefer an API construct like the one BBG and Rackspace have agreed on. This reason alone should be enough to revisit the topic with the core team members so we operators can fully understand their objections. I believe operators should play a large role in Openstack and their opinions and reasons why should be heard. I agree that operator's concerns need to be addressed, but the argument above just suggest that it can be done by other means rather then providing 'virtual appliance' functionality. I think it will be more constructive to think and work in this direction. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] API proposal review thoughts
Hi Stephen, Well, sure, except the user is going to want to know what the IP address(es) are for obvious reasons, and expect them to be taken from subnet(s) the user specifies. Asking the user to provide a Neutron network_id (ie. where we'll attach the L2 interface) isn't definitive here because a neutron network can contain many subnets, and these subnets might be either IPv4 or IPv6. Asking the user to provide an IPv4 and IPv6 subnet might cause us problems if the IPv4 subnet provided and the IPv6 subnet provided are not on the same neutron network. In that scenario, we'd need two L2 interfaces / neutron ports to service this, and of course some way to record this information in the model. Right, that's why VIP need to have clear definition in relation to L2 port: we allow one L2 port per VIP, hence only addresses from subnets from one network are allowed. That seems to be a fair limitation. We could introduce the restriction that all of the IP addresses / subnets associated with the VIP must come from the same neutron network, Right. but this begs the question: Why? Why shouldn't a VIP be allowed to connect to multiple neutron networks to service all its front-end IPs? If the answer to the above is there's no reason or because it's easier to implement, then I think these are not good reasons to apply these restrictions. If the answer to the above is because nobody deploys their IPv4 and IPv6 networks separate like that, then I think you are unfamiliar with the environments in which many operators must survive, nor the requirements imposed on us by our users. :P I approach this question from opposite side: if we allow this - we're exposing 'virtual appliance'-API, where user fully controls how lb instance is wired, how many VIPs it has, etc. As i said in other thread, that is 'virtual functions vs virtualized appliance' question which is about general neutron project goal. If something seem to map more easily on physical infrastructure (or to a concept o physical infra) doesn't mean that cloud API needs to follow that. In any case, if you agree that in the IPv4 + IPv6 case it might make sense to allow for multiple L2 interfaces on the VIP, doesn't it then also make more sense to define a VIP as a single IP address (ie. what the rest of the industry calls a VIP), and call the groupings of all these IP addresses together a 'load balancer' ? At that point the number of L2 interfaces required to service all the IPs in this VIP grouping becomes an implementation problem. For what it's worth, I do go back and forth on my opinion on this one, as you can probably tell. I'm trying to get us to a model that is first and foremost simple to understand for users, and relatively easy for operators and vendors to implement. Users are different, and you apparently consider those who understand networks and load balancing. I was saying that it's *much more intuitive to understand and less confusing for users* to do it using a logical load balancer construct. I've yet to see a good argument for why working with colocation_hints / apolocation_hints or affinity grouping rules (akin to the nova model) is *easier* *for the user to understand* than working with a logical load balancer model. Something done by hand may be much more intuitive than something performed by magic behind scheduling, flavors etc. But that doesn't seem like a good reason to me to put user in charge of defining resource placement. And by the way-- maybe you didn't see this in my example below, but just because a user is using separate load balancer objects doesn't mean the vendor or operator needs to implement these on separate pieces of hardware. Whether or not the operator decides to let the user have this level of control will be expressed in the flavor. Yes, and without container user has less than that - only balancing endpoints - VIPs, without direct control of how they are grouped within instances. That might be so, but apparently it goes in opposite direction than neutron in general (i.e. more abstraction) Doesn't more abstraction give vendors and operators more flexibility in how they implement it? Isn't that seen as a good thing in general? In any case, this sounds like your opinion more than an actual stated or implied agenda from the Neutron team. And even if it is an implied or stated agenda, perhaps it's worth revisiting the reason for having it? I'm translating the argument of other team members and it seems valid to me. For sure you can try to revisit those reasons ;) So what are the main arguments against having this container object? In answering this question, please keep in mind: - If you say implementation details, please just go ahead and be more specific because that's what I'm going to ask you to do anyway. If implementation details is the concern, please follow this with a hypothetical or concrete example as to what kinds of
Re: [openstack-dev] [Neutron][LBaaS] API proposal review thoughts
Carlos, The general objection is that if we don't need multiple VIPs (different ip, not just tcp ports) per single logical loadbalancer, then we don't need loadbalancer because everything else is addressed by VIP playing a role of loadbalancer. Regarding conclusions - I think we've heard enough negative opinions on the idea of 'container' to at least postpone this discussion to the point when we'll get some important use cases that could not be addressed by 'VIP as loadbalancer' Eugene. On Fri, May 9, 2014 at 8:33 AM, Carlos Garza carlos.ga...@rackspace.comwrote: On May 8, 2014, at 2:45 PM, Eugene Nikanorov enikano...@mirantis.com wrote: Hi Carlos, Are you saying that we should only have a loadbalancer resource only in the case where we want it to span multiple L2 networks as if it were a router? I don't see how you arrived at that conclusion. Can you explain further. No, I mean that loadbalancer instance is needed if we need several *different* L2 endpoints for several front ends. That's basically 'virtual appliance' functionality that we've discussed on today's meeting. From looking at the irc log it looks like nothing conclusive came out of the meeting. I don't understand a lot of the conclusions you arrive at. For example your rejecting the notion of a loadbalancer concrete object unless its needed to include multi l2 network support. Will you make an honest effort to describe your objections here in the ML cause if we can't resolve it here its going to spill over into the summit. I certainly don't want this to dominate the summit. Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] API proposal review thoughts
Hi Brandon Let me know if I am misunderstanding this,and please explain it further. A single neutron port can have many fixed ips on many subnets. Since this is the case you're saying that there is no need for the API to define multiple VIPs since a single neutron port can represent all the IPs that all the VIPs require? Right, if you want to to have both ipv4 and ipv6 addresses on the VIP then it's possible with single neutron port. So multiple VIPs for this case are not needed. Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron][LBaaS] Multiple VIPs per loadbalancer
Hi folks, I'm pulling this question out of another discussion: Is there a need to have multiple VIPs (e.g. multiple L2 ports/IP addresses) per logical loadbalancer? If so, we need the description of such cases to evaluate them. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Multiple VIPs per loadbalancer
On Fri, May 9, 2014 at 7:40 PM, Brandon Logan brandon.lo...@rackspace.comwrote: Yes, Rackspace has users that have multiple IPv4 and IPv6 VIPs on a single load balancer. For sure that can be supported by particular physical appliance, but I doubt we need to translate it to logical loadbalancer. However, I don't think it is a matter of it being needed. It's a matter of having an API that makes sense to a user. Just because the API has multiple VIPs doesn't mean every VIP needs its own port. In fact creating a port is an implementation detail (you know that phrase that everyone throws out to stonewall any discussions?). The user doesn't care how many neutron ports are set up underneath, they only care about the VIPs. Right, port creation is implementation detail, however, L2 connectivity for the frontend is a certain API expectation. I think VIP creation should have clear semantics: user creates L2 endpoint, e.g. l2 port + ipv4[+ipv6] address. If we agree that we only need 1 L2 port per logical loadbalancer, then it could be handled by two API/objmodel approaches: 1) loadbalancer + VIPs, 1:n relationship 2) VIP + listeners, 1:n relationship You see that from API and obj model structure perspective those approaches are exactly the same. However, in (1) we would need to specify L3 information (ipv4 + ipv6 addresses, subnet_id) for the loadbalancer, and that will be inherited by VIPs which would keep info about L4+ To me it seems a little bit confusing (per our glossary) While in second approach VIP remains a keeper of L2/L3 information, while listeners keep L4+ information. That seems to be more clear. In case we want more than one L2 port, then we need to combine those approaches and have loadbalancer+VIPs+Listeners, where loadbalancer is a container that maps to a backend. However, per discussed on the last meeting, we don't want to let user have direct control over the backend. Also we've heard objection to this approach several times from other core team members (this discussion has been going for more than half a year now), so I would suggest to move forward with single L2 port approach. Then the question goes down to terminology: loadbalancer/VIPs or VIP/Listeners. Also, the load balancer wouldn't just be a container, the load balancer would have flavor, affinity, and other metadata on it. Plus, a user will expect to get a load balancer back. Since this object can only be described as a load balancer, the name of it shouldn't be up for debate. Per comments above - VIP can also play this role. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] API proposal review thoughts
Hi Stephen, A couple of inline comments: - BBG proposal just has attributes for both and IPv4 address and an IPv6 address in its VIP object. (Meaning it's slightly different than a VIP as people are likely to assume what that term means.) This is a correct approach. VIP has single neutron port, which however may have ip address on several subnets at once, so ipv4+ipv6 is easily solved within 1 VIP. I think that's a preferred way. - *Maybe we should wait until the survey results are out?* No sense solving for use cases that nobody cares about, eh. *Maybe we should just look at the differences?* The core object models we've proposed are almost identical. Change the term Listener to Load Balancer in my model, and you've essentially got the same thing as the Rackspace model. I guess you meant VIP, not Listener. I think what is more important is tree-like configuration structure. However having Loadbalancer as the root object vs VIP has difference in meaning. Loadbalancer implies several L2 ports for the frontend (e.g. multiple VIPs with own ip addresses) while VIP implies only one L2 port. For example, I understand the Rackspace model is using a join object between load balancer and vip so these can have a n:m relationship-- and this is almost entirely to solve use case #14 in the document. This is clearly an overkill to share VIPs between loadbalancer instances. *We need to decide what load balancer means and go that.* This has been something that's come up a lot, and the more we ignore it, it seems to me, the more it just adds to the confusion to the discussion. Rackspace is defining a load balancer as: An entity that contains multiple VIPs, but only one tcp/udp port and protocol ( http://en.wikipedia.org/wiki/Load_balancing_%28computing%29) . It may have a default pool (named just pool in API object). It also may have a content switching object attached that defines L7 rules. I may have missed something, did you mean one tcp/upd port and protocol per VIP? Or otherwise how is that possible? *What does the root mean when we're looking at an object graph, not a tree? Or maybe the people most likely to use the single-call interface should have the strongest voices in deciding where it should actually be placed?* I think probably the biggest source of contention over the API proposals thus far are what object should be considered the root of the tree. 'root object' has the sole purpose of transforming arbitrary graph of objects into a tree. We can't move forward without properly defining it. This whole concept seems to strike me as odd-- because when you have a graph, even if it's somewhat tree-like (ie. there are leaf nodes), does the term root even apply? Can someone please tell me what criteria they're using when they say that one object should be a root and another should not? Criterias are: - user can think of an object as representation of 'logical service instance' (logical loadbalancer) - workflow starts with object creation - it makes sense to apply attributes like Flavor (service requirements) to it. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] API proposal review thoughts
Hi Adam, My comments inline: 1. We shouldn't be looking at the current model and deciding which object is the root object, or what object to rename as a loadbalancer... That's totally backwards! *We don't define which object is named the loadbalancer by looking for the root object -- we define which object is the root by looking for the object named loadbalancer.* I had hoped that was clear from the JSON examples in our API proposal, but I think maybe there was too much focus on the object model chart, where this isn't nearly as clearly communicated. 2. As I believe I have also said before, if I'm using X as a Service then I expect to get back an object of type X. I would be very frustrated/confused if, as a user, LBaaS returned me an object of type VIP when I POST a Create for my new load balancer. On this last point, I feel like I've said this enough times that I'm beating a dead horse... I think we definitely should be looking at existing API/BBG proposal for the root object. The question about whether we need additional 'Loadbalancer' resource or not is not a question about terminology, so (2) is not a valid argument. What really matters in answering the question about 'loadbalancer' resource is do we need multiple L2 ports per single loadbalancer. If we do - that could be a justification to add it. Right now the common perception is that this is not needed and hence, 'loadbalancer' is not required in the API or obj model. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] API proposal review thoughts
Hi Carlos, Are you saying that we should only have a loadbalancer resource only in the case where we want it to span multiple L2 networks as if it were a router? I don't see how you arrived at that conclusion. Can you explain further. No, I mean that loadbalancer instance is needed if we need several *different* L2 endpoints for several front ends. That's basically 'virtual appliance' functionality that we've discussed on today's meeting. Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS]L7 conent switching APIs
Hi folks, I think it's ok to include content modification in a general API proposal as long as it would then receive separate spec document in neutron-specs. When it comes to particular feature design and implementation it's better to be more granular. Thanks, Eugene. On Tue, May 6, 2014 at 4:05 AM, Stephen Balukoff sbaluk...@bluebox.netwrote: Hi Sam, In working off the document in the wiki on L7 functionality for LBaaS ( https://wiki.openstack.org/wiki/Neutron/LBaaS/l7 ), I notice that MODIFY_CONTENT is one of the actions listed for a L7VipPolicyAssociation. That's the primary reason I included this in the API design I created. To be honest, it frustrates me more than a little to hear after the fact that the only locate-able documentation like this online is inaccurate on many meaningful details like this. I could actually go either way on this issue: I included content modification as one possible action of L7Policies, but it is somewhat wedged in there: It works, but in L7Policies that do content modification or blocking of the request, the order field as I've proposed it could be confusing for users, and these L7Policies wouldn't be associated with a back-end pool anyway. I'm interested in hearing others' opinions on this as well. Stephen On Mon, May 5, 2014 at 6:47 AM, Samuel Bercovici samu...@radware.comwrote: Hi Stephen, For Icehouse we did not go into L7 content modification as the general feeling was that it might not be exactly the same as content switching and we wanted to tackle content switching fiest. L7 content switching and L7 content modification are different, I prefer to be explicit and declarative and use different objects. This will make the API more readable. What do you think? I plan to look deeper into L7 content modification later this week to propose a list of capabilities. -Sam. *From:* Stephen Balukoff [mailto:sbaluk...@bluebox.net] *Sent:* Saturday, May 03, 2014 1:33 AM *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* Re: [openstack-dev] [Neutron][LBaaS]L7 conent switching APIs Hi Adam and Samuel! Thanks for the questions / comments! Reactions in-line: On Thu, May 1, 2014 at 8:14 PM, Adam Harwell adam.harw...@rackspace.com wrote: Stephen, the way I understood your API proposal, I thought you could essentially combine L7Rules in an L7Policy, and have multiple L7Policies, implying that the L7Rules would use AND style combination, while the L7Policies themselves would use OR combination (I think I said that right, almost seems like a tongue-twister while I'm running on pure caffeine). So, if I said: Well, my goal wasn't to create a whole DSL for this (or anything much resembling this) because: 1. Real-world usage of the L7 stuff is generally pretty primitive. Most L7Policies will consist of 1 rule. Those that consist of more than one rule are almost always the sort that need a simple sort. This is based off the usage data collected here (which admittedly only has Blue Box's data-- because apparently nobody else even offers L7 right now?) https://docs.google.com/spreadsheet/ccc?key=0Ar1FuMFYRhgadDVXZ25NM2NfbGtLTkR0TDFNUWJQUWcusp=sharing 2. I was trying to keep things as simple as possible to make it easier for load balancer vendors to support. (That is to say, I wouldn't expect all vendors to provide the same kind of functionality as HAProxy ACLs, for example.) Having said this, I think yours and Sam's clarification that different L7Policies can be used to effective OR conditions together makes sense, and therefore assuming all the Rules in a given policy are ANDed together makes sense. If we do this, it therefore also might make sense to expose other criteria on which L7Rules can be made, like HTTP method used for the request and whatnot. Also, should we introduce a flag to say whether a given Rule's condition should be negated? (eg. HTTP method is GET and URL is *not* /api) This would get us closer to being able to use more sophisticated logic for L7 routing. Does anyone foresee the need to offer this kind of functionality? * The policy { rules: [ rule1: match path REGEX .*index.*, rule2: match path REGEX hello/.* ] } directs to Pool A * The policy { rules: [ rule1: match hostname EQ mysite.com ] } directs to Pool B then order would matter for the policies themselves. In this case, if they ran in the order I listed, it would match mysite.com/hello/index.htm and direct it to Pool A, while mysite.com/hello/nope.htm would not match BOTH rules in the first policy, and would be caught by the second policy, directing it to Pool B. If I had wanted the first policy to use OR logic, I would have just specified two separate policies both pointing to Pool A: Clarification on this: There is an 'order' attribute to L7Policies. :) But again, if all the L7Rules in a given policy
[openstack-dev] [Neutron][LBaaS] Subteam meeting Thursday, 05/08 14-00 UTC
Hi folks, This will be the last meeting before the summit, so I suggest we will focus on the agenda for two design track slots we have. Per my experience design tracks are not very good for in-depth discussion, so it only make sense to present a road map and some other items that might need core team attention like interaction with Barbican and such. Another item for the meeting will be comparison of API proposals which as an action item from the last meeting. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Use-Cases with VPNs Distinction
Agree with Sam here, Moreover, i think it makes sense to leave subnet an attribute of the pool. Which would mean that members reside in that subnet or are available (routable) from this subnet, and LB should have a port on this subnet. Thanks, Eugene. On Fri, May 2, 2014 at 3:51 PM, Samuel Bercovici samu...@radware.comwrote: I think that associating a VIP subnet and list of member subnets is a good choice. This is declaratively saying to where is the configuration expecting layer 2 proximity. The minimal would be the VIP subnet which in essence means the VIP and members are expected on the same subnet. Any member outside the specified subnets is supposedly accessible via routing. It might be an option to state the static route to use to access such member(s). On many cases the needed static routes could also be computed automatically. Regards, -Sam. On 2 במאי 2014, at 03:50, Stephen Balukoff sbaluk...@bluebox.net wrote: Hi Trevor, I was the one who wrote that use case based on discussion that came out of the question I wrote the list last week about SSL re-encryption: Someone had stated that sometimes pool members are local, and sometimes they are hosts across the internet, accessible either through the usual default route, or via a VPN tunnel. The point of this use case is to make the distinction that if we associate a neutron_subnet with the pool (rather than with the member), then some members of the pool that don't exist in that neutron_subnet might not be accessible from that neutron_subnet. However, if the behavior of the system is such that attempting to reach a host through the subnet's default route still works (whether that leads to communication over a VPN or the usual internet routes), then this might not be a problem. The other option is to associate the neutron_subnet with a pool member. But in this case there might be problems too. Namely: - The device or software that does the load balancing may need to have an interface on each of the member subnets, and presumably an IP address from which to originate requests. - How does one resolve cases where subnets have overlapping IP ranges? In the end, it may be simpler not to associate neutron_subnet with a pool at all. Maybe it only makes sense to do this for a VIP, and then the assumption would be that any member addresses one adds to pools must be accessible from the VIP subnet. (Which is easy, if the VIP exists on the same neutron_subnet. But this might require special routing within Neutron itself if it doesn't.) This topology question (ie. what is feasible, what do people actually want to do, and what is supported by the model) is one of the more difficult ones to answer, especially given that users of OpenStack that I've come in contact with barely understand the Neutron networking model, if at all. In our case, we don't actually have any users in the scenario of having members spread across different subnets that might not be be routable, so the use case is somewhat contrived, but I thought it was worth mentioning based on what people were saying in the SSL re-encryption discussion last week. On Thu, May 1, 2014 at 1:52 PM, Trevor Vardeman trevor.varde...@rackspace.com wrote: Hello, After going back through the use-cases to double check some of my understanding, I realized I didn't quite understand the ones I had already answered. I'll use a specific use-case as an example of my misunderstanding here, and hopefully the clarification can be easily adapted to the rest of the use-cases that are similar. Use Case 13: A project-user has an HTTPS application in which some of the back-end servers serving this application are in the same subnet, and others are across the internet, accessible via VPN. He wants this HTTPS application to be available to web clients via a single IP address. In this use-case, is the Load Balancer going to act as a node in the VPN? What I mean here, is the Load Balancer supposed to establish a connection to this VPN for the client, and simulate itself as a computer on the VPN? If this is not the case, wouldn't the VPN have a subnet ID, and simply be added to a pool during its creation? If the latter is accurate, would this not just be a basic HTTPS Load Balancer creation? After looking through the VPNaaS API, you would provide a subnet ID to the create VPN service request, and it establishes a VPN on said subnet. Couldn't this be provided to the Load Balancer pool as its subnet? Forgive me for requiring so much distinction here, but what may be clear to the creator of this use-case, it has left me confused. This same type of clarity would be very helpful across many of the other VPN-related use-cases. Thanks again! -Trevor ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org
Re: [openstack-dev] [Neutron][LBaaS] Fulfilling Operator Requirements: Driver / Management API
Hi Adam, My comments inline: On Fri, May 2, 2014 at 1:33 AM, Adam Harwell adam.harw...@rackspace.comwrote: I am sending this now to gauge interest and get feedback on what I see as an impending necessity — updating the existing haproxy driver, replacing it, or both. I agree with Stephen's first point here. For HAProxy driver to support advanced use cases like routed mode, it's agent should be severely changed and receive some capabilities of L3 agent. In fact, I'd suggest making additional driver, not for haproxy in VMs, but for... dedicated haproxy nodes. Dedicated haproxy node is a host (similar to compute) with L2 agent and lbaas (not necessarily existing) agent on it. In fact, it's essentially the same model as used right now, but i think it has it's advantages over haproxy-in-vm, at least: - plugin driver doesn't need to manage VM life cycle (no orchestration) - immediate natural multitenant support with isolated networks - instead of adding haproxy in VM, you add a process (which is both faster and more efficient); more scaling is achieved by adding physical haproxy node; existing agent health reporting will make it available for loadbalancer scheduling automatically. *HAProxy*: This references two things currently, and I feel this is a source of some misunderstanding. When I refer to HAProxy (capitalized), I will be referring to the official software package (found here: http://haproxy.1wt.eu/ ), and when I refer to haproxy (lowercase, and in quotes) I will be referring to the neutron-lbaas driver (found here: https://github.com/openstack/neutron/tree/master/neutron/services/loadbalancer/drivers/haproxy ). The fact that the neutron-lbaas driver is named directly after the software package seems very unfortunate, and while it is not directly in the scope of what I'd like to discuss here, I would love to see it changed to more accurately reflect what it is -- one specific driver implementation that coincidentally uses HAProxy as a backend. More on this later. We also was referring existing driver as haproxy-on-host. *Operator Requirements*: The requirements that can be found on the wiki page here: https://wiki.openstack.org/wiki/Neutron/LBaaS/requirements#Operator_Requirements and focusing on (but not limited to) the following list: * Scalability * DDoS Mitigation * Diagnostics * Logging and Alerting * Recoverability * High Availability (this is in the User Requirements section, but will be largely up to the operator to handle, so I would include it when discussing Operator Requirements) Those requirements are of very different kinds and they are going to be addressed by quite different components of lbaas, not solely by the driver. *Management API*: A restricted API containing resources that Cloud Operators could access, including most of the list of Operator Requirements (above). The work is being done on this front: we're designing a way for plugin drivers to expose their own API, that specifically is needed for operator API which might not be common between providers. *Load Balancer (LB)*: I use this term very generically — essentially a logical entity that represents one use case. As used in the sentence: I have a Load Balancer in front of my website. or The Load Balancer I set up to offload SSL Decryption is lowering my CPU load nicely. -- Overview -- What we've all been discussing for the past month or two (the API, Object Model, etc) is being directly driven by the User and Operator Requirements that have somewhat recently been enumerated (many thanks to everyone who has contributed to that discussion!). With that in mind, it is hopefully apparent that the current API proposals don't directly address many (or really, any) of the Operator requirements! Where in either of our API proposals are logging, high availability, scalability, DDoS mitigation, etc? I believe the answer is that none of these things can possibly be handled by the API, but are really implementation details at the driver level. Radware, NetScaler, Stingray, F5 and HAProxy of any flavour would all have very different ways of handling these things (these are just some of the possible backends I can think of). At the end of the day, what we really have are the requirements for a driver, which may or may not use HAProxy, that we hope will satisfy all of our concerns. That said, we may also want to have some form of Management API to expose these features in a common way. I'm not sure on the 'common way' here. I'd prefer to let vendors implement what is suitable for them and converge on similarities later. In this case, we really need to discuss two things: 1. Whether to update the existing haproxy driver to accommodate these Operator Requirements, or whether to start from scratch with a new driver (possibly both). See my comment on this above. I'd prefer to have
Re: [openstack-dev] [Neutron][LBaaS]Conforming to Open Stack API style in LBaaS
Hi, My opinion is that keeping neutron API style is very important but it doesn't prevent single call API from being implemented. Flat fine-grained API is obviously the most flexible, but that doesn't mean we can't support single call API as well. By the way, looking at the implementation I see that such API (single call) should be also supported in the drivers, so it is not just something 'on top' of fine-grained API. Such requirement comes from the fact that fine-grained API is asynchronous. Thanks, Eugene. On Thu, May 1, 2014 at 5:18 AM, Kyle Mestery mest...@noironetworks.comwrote: I am fully onboard with the single-call approach as well, per this thread. On Wed, Apr 30, 2014 at 6:54 PM, Stephen Balukoff sbaluk...@bluebox.net wrote: It's also worth stating that coding a web UI to deploy a new service is often easier done when single-call is an option. (ie. only one failure scenario to deal with.) I don't see a strong reason we shouldn't allow both single-call creation of whole bunch of related objects, as well as a workflow involving the creation of these objects individually. On Wed, Apr 30, 2014 at 3:50 PM, Jorge Miramontes jorge.miramon...@rackspace.com wrote: I agree it may be odd, but is that a strong argument? To me, following RESTful style/constructs is the main thing to consider. If people can specify everything in the parent resource then let them (i.e. single call). If they want to specify at a more granular level then let them do that too (i.e. multiple calls). At the end of the day the API user can choose the style they want. Cheers, --Jorge From: Youcef Laribi youcef.lar...@citrix.com Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Date: Wednesday, April 30, 2014 1:35 PM To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Neutron][LBaaS]Conforming to Open Stack API style in LBaaS Sam, I think it’s important to keep the Neutron API style consistent. It would be odd if LBaaS uses a different style than the rest of the Neutron APIs. Youcef From: Samuel Bercovici [mailto:samu...@radware.com] Sent: Wednesday, April 30, 2014 10:59 AM To: openstack-dev@lists.openstack.org Subject: [openstack-dev] [Neutron][LBaaS]Conforming to Open Stack API style in LBaaS Hi Everyone, During the last few days I have looked into the different LBaaS API proposals. I have also looked on the API style used in Neutron. I wanted to see how Neutron APIs addressed “tree” like object models. Follows my observation: 1. Security groups - http://docs.openstack.org/api/openstack-network/2.0/content/security-groups-ext.html ) – a. security-group-rules are children of security-groups, the capability to create a security group with its children in a single call is not possible. b. The capability to create security-group-rules using the following URI path v2.0/security-groups/{SG-ID}/security-group-rules is not supported c.The capability to update security-group-rules using the following URI path v2.0/security-groups/{SG-ID}/security-group-rules/{SGR-ID} is not supported d. The notion of creating security-group-rules (child object) without providing the parent {SG-ID} is not supported 2. Firewall as a service - http://docs.openstack.org/api/openstack-network/2.0/content/fwaas_ext.html- the API to manage firewall_policy and firewall_rule which have parent child relationships behaves the same way as Security groups 3. Group Policy – this is work in progress - https://wiki.openstack.org/wiki/Neutron/GroupPolicy - If I understand correctly, this API has a complex object model while the API adheres to the way other neutron APIs are done (ex: flat model, granular api, etc.) How critical is it to preserve a consistent API style for LBaaS? Should this be a consideration when evaluating API proposals? Regards, -Sam. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Stephen Balukoff Blue Box Group, LLC (800)613-4305 x807 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Thoughts on current process
Hi Jorge, A couple of inline comments: Now that we have a set of requirements the next question to ask is, How doe we prioritize requirements so that we can start designing and implementing them? Prioritization basically means that we want to support everything and only choose what is more important right now and what is less important and can be implemented later. Assuming requirements are prioritized (which as of today we have a pretty good idea of these priorities) the next step is to design before laying down any actual code. That's true. I'd only would like to notice that there were actually a road map and requirements with design before the code was written, that's both for the features that are already implemented, and those which now are hanging in limbo. I agree with Samuel that pushing the cart before the horse is a bad idea in this case (and it usually is the case in software development), especially since we have a pretty clear idea on what we need to be designing for. I understand that the current code base has been worked on by many individuals and the work done thus far is the reason why so many new faces are getting involved. However, we now have a completely updated set of requirements that the community has put together and trying to fit the requirements to existing code may or may not work. In my experience, I would argue that 99% of the time duct-taping existing code I really don't like the term duct-taping here. Here's the problem: you'll never will be able to implement everything at once, you have to do it incrementally. That's how ecosystem works. Each step can be then considered as 'duct-taping' because each state you're getting to is not accounting for everything what was planned. And for sure, there will be design mistakes that need to be fixed. In the end there will be another cloud provider with another set of requirements... So in order to deal with that in a productive way there are a few guidelines: 1) follow the style of ecosystem. Consistency is important. Keeping the style helps both developers, reviewers and users of the product. 2) Preserve backward compatibility whenever possible. That's a very important point which however can be 'relaxed' if existing code base is completely unable to evolve to support new requirements. to fit in new requirements results in buggy software. That being said, I usually don't like to rebuild a project from scratch. If I can I try to refactor as much as possible first. However, in this case we have a particular set of requirements that changes the game. Particularly, operator requirements have not been given the attention they deserve. Operator requirements really don't change the game here. You're right that operator requirements were not given the attention. It's not because developers of lbaas have not thought about it, it's because we were limited in dev and core reviewing resources, so implement But what is more important, operator requirements mostly doesn't affect tenant API that we were discussing. That's true that almost none of them are addressed by existing code base, but it only means that it should be implemented. When talking about existing code base I'd expect the following questions before any decision is made: 1) how can we do (implement) X with existing code base? 2) if we can't do X, is it possible to fix the code in a simple way and just implement X on top of existing? If both answers are No, and X is really impossible with existing code base - that could be a reason to deeply revise it. Looking at operator requirements I don't see a single one that could lead to that. Because several of us have been spending large amounts of time on API proposals, and because we can safely assume that most operational requirements are abstracted into the driver layer I say we continue the conversation around the different proposals since this is the area we definitely need consensus on. So far there are three proposals--Stephen's, Rackspace's and Eugene's. I'd like to comment that my proposal is actually a small part of Stephen's that touches the core lbaas API only. So i would not treat it separately in this context. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Thoughts on current process
Sorry, missed the phrase ending: It's not because developers of lbaas have not thought about it, it's because we were limited in dev and core reviewing resources, so implement so implementing some of the operators requirements was always in our plans. Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] Thoughts on current process
Hi, On Thu, May 1, 2014 at 10:46 PM, Jorge Miramontes jorge.miramon...@rackspace.com wrote: Hey Eugene, I think there is a misunderstanding on what iterative development means to you and me and I want to make sure we are on the same page. First of all, I'll try not to use the term duct-taping even though it's a widely used term in the industry. I'm not against the term itself. It was applied several times to existing code base, apparently without ANY real code analysis. That's especially clearly seen because all API proposals so far are focusing on managing the same set of lb primitives. Yes, the proposals introduce some new primitives; yes some attributes and relationships differ from what is in the code. But nothing was proposed so far that would require to completely throw away existing code, not a single requirement. I understand that writing something from scratch can be more convenient for developers than studying existing code, but that's something we all have to do when working on opensource project. My main concern is that implementing code on top of the current codebase to meet the smorgasbord of new requirements without thinking about overall design (since we know we will eventually want all the requirements satisfied at some point per your words) Overall design was thought out long before we started having all these discussions. And things are not quick in neutron project, that's regardless of amount of dev resources lbaas subteam may have. is that some requirement implemented 6 months from now may change code architecture. Since we know we want to meet all requirements eventually, its makes logical sense to design for what we know we need and then figure out how to iteratively implement code over time. That was initially done on Icehouse summit, and we just had to reiterate the discussion for new subteam members who has joined recently. I agree that to design for what we know we need, but the primary option should be to continue existing work and analyse it to find gaps, that what Samuel and me were focusing on. Stephen's proposal also goes along this idea because everything in his doc can be implemented gradually starting from existing code. That being said, if it makes sense to use existing code first then fine. In fact, I am a fan of trying manipulate as little code as possible unless we absolutely have to. I just want to be a smart developer and design knowing I will eventually have to implement something. Not keeping things in mind can be dangerous. I fully agree and that's well understood. In short, I want to avoid having to perform multiple code refactors if possible and design upfront with the list of requirements the community has spent time fleshing out. Also, it seems like you have some implicit developer requirements that I'd like written somewhere. This may ease confusion as well. For example, you stated Consistency is important. A clear definition in the form of a developer requirement would be nice so that the community understands your expectations. It might be a bit difficult to formalize. So you know, we're not the only ones who will make decisions on the implementation. There is a core team, who mostly out of lbaas discussions right now (and that fact will not change), who have their own views on how neutron API should look like, what is allowed and what is not. To get a sense of it, one really needs to contribute to neutron: push the code through 10-20-50 review iterations, see what other developers are concerned about. Obviously we can't get everyone to our discussions, but other core dev may (or may not, i don't know for sure) just -1 your implementation because you to /object1/id/object2/id/object3/id instead of flat rest API that neutron has, or something like that. Then you'll probably spend another month or two trying to discuss these issues again with other group of folks. We don't have rigid guidelines on how the code should be written; understanding of that comes with experience and with discussions on gerrit. Lastly, in relation to operator requirements I didn't see you comment on whether you are fan of working on an open-source driver together. Just so you know, operator requirements are very important for us and I honestly don't see how we can use any current driver without major modifications. This leads me to want to create a new driver with operator requirements being central to the design. The driver itself, IMO, is the most flexible part of the system. If you think it needs to be improved or even rewritten (once it does what user asks it to do via API) - I'd be glad to discuss that. I think rm_work (is that Adam Harwell?) was going to start a thread on this in ML. Btw, am my understanding is correct that you (as cloud operator) are mostly interested in haproxy as a backend? Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org
Re: [openstack-dev] [Neutron][LBaaS] Thoughts on current process
We wanted a discussion to happen on whether the existing object model would work with both API proposals. That blueprint being pushed to gerrit the same time as Stephen mailing out his proposal made it seem like this was not going to happen. I'm sorry about that. In fact I was just planning to propose a more detailed design of what could be treated as a part of Stephen proposal. I also think that we'll converge on Stephen's and Rackspace's proposal eventually. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Neutron] Nova-network to Neutron migration: issues with libvirt
I think it's better to test with some tcp connection (ssh session?) rather then with ping. Eugene. On Wed, Apr 30, 2014 at 5:28 PM, Oleg Bondarev obonda...@mirantis.comwrote: So by running ping while instance interface update we can see ~10-20 sec of connectivity downtime. Here is a tcp capture during update (pinging ext net gateway): *05:58:41.020791 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 10, length 64* *05:58:41.020866 IP 172.24.4.1 10.0.0.4 http://10.0.0.4: ICMP echo reply, id 29954, seq 10, length 64* *05:58:41.885381 STP 802.1s, Rapid STP, CIST Flags [Learn, Forward, Agreement]* *05:58:42.022785 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 11, length 64* *05:58:42.022832 IP 172.24.4.1 10.0.0.4 http://10.0.0.4: ICMP echo reply, id 29954, seq 11, length 64* *[vm interface updated..]* *05:58:43.023310 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 12, length 64* *05:58:44.024042 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 13, length 64* *05:58:45.025760 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 14, length 64* *05:58:46.026260 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 15, length 64* *05:58:47.027813 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 16, length 64* *05:58:48.028229 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 17, length 64* *05:58:49.029881 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 18, length 64* *05:58:50.029952 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 19, length 64* *05:58:51.031380 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 20, length 64* *05:58:52.032012 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 21, length 64* *05:58:53.033456 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 22, length 64* *05:58:54.034061 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 23, length 64* *05:58:55.035170 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 24, length 64* *05:58:56.035988 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 25, length 64* *05:58:57.037285 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 26, length 64* *05:58:57.045691 ARP, Request who-has 10.0.0.1 tell 10.0.0.4, length 28* *05:58:58.038245 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 27, length 64* *05:58:58.045496 ARP, Request who-has 10.0.0.1 tell 10.0.0.4, length 28* *05:58:59.040143 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 28, length 64* *05:58:59.045609 ARP, Request who-has 10.0.0.1 tell 10.0.0.4, length 28* *05:59:00.040789 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 29, length 64* *05:59:01.042333 ARP, Request who-has 10.0.0.1 tell 10.0.0.4, length 28* *05:59:01.042618 ARP, Reply 10.0.0.1 is-at fa:16:3e:61:28:fa (oui Unknown), length 28* *05:59:01.043471 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 30, length 64* *05:59:01.063176 IP 172.24.4.1 10.0.0.4 http://10.0.0.4: ICMP echo reply, id 29954, seq 30, length 64* *05:59:02.042699 IP 10.0.0.4 172.24.4.1 http://172.24.4.1: ICMP echo request, id 29954, seq 31, length 64* *05:59:02.042840 IP 172.24.4.1 10.0.0.4 http://10.0.0.4: ICMP echo reply, id 29954, seq 31, length 64* However this connectivity downtime can be significally reduced by restarting network service on the instance right after interface update. On Mon, Apr 28, 2014 at 6:29 PM, Kyle Mestery mest...@noironetworks.comwrote: On Mon, Apr 28, 2014 at 9:19 AM, Oleg Bondarev obonda...@mirantis.com wrote: On Mon, Apr 28, 2014 at 6:01 PM, Kyle Mestery mest...@noironetworks.com wrote: On Mon, Apr 28, 2014 at 8:54 AM, Oleg Bondarev obonda...@mirantis.com wrote: Yeah, I also saw in docs that update-device is supported since 0.8.0 version, not sure why it didn't work in my setup. I installed latest libvirt 1.2.3 and now update-device works just fine and I am able to move instance tap device from one bridge to another with no downtime and no reboot! I'll try to investigate why it didn't work on 0.9.8 and which is the minimal libvirt version for this. Wow, cool! This is really good news. Thanks for driving this! By chance did you notice if there was a drop in connectivity at all, or if the guest detected the move at all? Didn't check it yet. What in your opinion would be the best way of testing this? The simplest way would to have a ping running when you run update-device and see if any packets are dropped. We can do more
[openstack-dev] [Neutron][LBaaS] Subteam meeting Thursday 14-00 UTC
Hi, The agenda for the next meeting (Thursday, 1st of May, 14-00 UTC) is the following: 1) Stephen's API proposal: https://docs.google.com/document/d/129Da7zEk5p437_88_IKiQnNuWXzWaS_CpQVm4JBeQnc/edit#heading=h.hgpfh6kl7j7a The document proposes the API that covers pretty much of the features that we've identified on the requirements page. The use cases are being addressed also. We need to converge on general approach proposed there. 2) Summit agenda. We have two sessions at the neutron track. It makes sense to focus on the topics that will benefit most from face-to-face discussion. Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][LBaaS] BBG edit of new API proposal
On Fri, Apr 25, 2014 at 4:03 AM, Eugene Nikanorov enikano...@mirantis.com wrote: Hi Stephen, Thanks for the great document. As I promised, I'll try to make a few action items out if it. First of all, I'd like to say that the API you have proposed is very close to what is proposed in the blueprint https://review.openstack.org/#/c/89903/ with several differences I'd like to address here and make them action items. Ok, the above blueprint was uploaded on April 23rd, literally the day I sent my API proposal to the mailing list. And this was after it was known via the action items in the previous week's IRC meeting that my team and I would be working hard over the next week to produce an API proposal revision, based on the excellent work of the Rackspace team from the previous week. I can understand having a plan B in case I missed a deadline there (after all, I wasn't all that confident we'd get it done in time, but I worked through the weekend and pulled some very long days to get it done)-- but it's pretty offensive to me to realize that the action items from the previous week's meeting apparently weren't being taken seriously. I'm not sure which of my actions exactly has offended you, was it submitting a blueprint to neutron-spec? I've been working on the several proposals, including the one on review since the start of Icehouse, preparing wikis, docs, and even an attempt to actually implement the proposal, so I'm not sure what exactly I did wrong. I was not thinking about beating anyone to submit blueprint first, it was on launchpad since the end of havana anyway and i've just renewed it and followed the new process. That not only was there apparently never an intent to seriously consider my proposal, but now, if I want to be heard, I'm going to have to familiarize myself with your proposal and fight tooth and nail to be heard to fix any brokenness I will almost certainly find in it. And I have to do this with very little confidence that my voice is going to be heard. I think our disconnect comes from the fact that whole discussion that started in the end of Havana from fixing the API in the way that L7 multiple listeners are possible, came to the discussion of 'what lbaas API of the dream should look like'. Your document does great job of addressing the latter, but it's hardly a single action item/single blueprint/single patch. Gee. Thanks. Sorry, In fact I should have meant that some of those action items are for me actually, your doc is very detailed, while spec on review yet is not. I'll try to bring it to some accordance to your doc. So, first of all, I think that API described in the doc seems to account for all cases we had in mind, i didn't check on case-by-cases basis, it's just a first glance impression looking at REST resource set and their attributes. As an aside, let's us please, please, please get these use cases out of being in mind and into a unified, digestible, revision-controlled document. I guarantee you haven't thought of some of the use cases I have in mind. General idea of the whole API/obj model improvement is that we create a baseline for all advanced features/usecases that we have in the roadmap. Which means that those features then can be added incrementally. Incrementally means that resources or attributes might be added, but workflow remains and backward compatibility is preserved. That was not the case with multiple listeners and L7. I would argue that that wasn't the case for SSL either in the old model. And even the model I proposed doesn't address HA yet. The model shouldn't immediately address HA. You just have to make sure that once you decided to add HA, you don't have to redesign the rest of it (but that probably requires some ideas on how to add HA) It's assumed that the object model proposed won't affect the ability of vendors to implement HA in whatever way makes sense for their organization... but we don't really know that until we see some rough models on that front. Anyway, I think the point remains that the whole reason for the API and object model revision was to be more forward thinking, specifically about those features that are currently sorely lacking from Neutron LBaaS, and which both users and operators clearly need before the solution can be deployed on a wide scale. And yes, of course this is a design that will necessarily be implemented incrementally! First priority should be to deliver at least the same level of functionality that the old api / object model delivers. But, again, I think it's a really good idea to be forward thinking about these sorts of things. If I'm planning a road trip from Denver to New York, I like to consider at a high level at least what kind of route will take me there (so I can be sure that I don't realize 500 miles into the trip that I've been driving toward L.A. the whole time). So, a couple on general comments: 1. Whole discussion
Re: [openstack-dev] [Neutron][LBaaS] BBG edit of new API proposal
Hi, You knew from the action items that came out of the IRC meeting of April 17 that my team would be working on an API revision proposal. You also knew that this proposal was to be accompanied by an object model diagram and glossary, in order to clear up confusion. You were in that meeting, you saw the action items being created. Heck, you even added the to prepare API for SSL and L7 directive for my team yourself! The implied but not stated assumption about this work was that it would be fairly evaluated once done, and that we would be given a short window (ie. about a week) in which to fully prepare and state our proposal. Your actions, though, were apparently to produce your own version of the same in blueprint form without notifying anyone in the group that you were going to be doing this, let alone my team. How could you have given my API proposal a fair shake prior to publishing your blueprint, if both came out on the same day? (In fact, I'm lead to believe that you and other Neutron LBaaS developers hadn't even looked at my proposal before the meeting on 4/24, where y'all started determining product direction, apparently by edict.) Therefore, looking honestly at your actions on this and trying to give you the benefit of the doubt, I still must assume that you never intended to seriously consider our proposal. That's strange to hear because the spec on review is a part of what is proposed in the document made by you and your team. Once again I'm not sure what this heated discussion is all about. The doc does good job and we will continue discussing it, while a part of it (spec about VIPs/Listeners/Pools) is on review where we, as lbaas subteam, actually can finalize a part of it. Do you now understand why I find this offensive? Can you also understand how others, seeing how this was handled, might now be reluctant to participate? People may find different things to be offensive. I can also say much on this, but would not like not continue the conversation in this direction. Right, so *if* we decide to go with my proposal, we need to first decide which parts we're actually going to go with-- I don't expect my proposal to be complete or perfect by any means, and we need to have honest discussion of this first. Then, once we've more-or-less come to a consensus on this overall direction, I'm not sure i understand what you mean by 'overall direction'. Was there ever an idea of not supporting HA, or L7, or SSL or to not satisfy other requirements? The discussion could be on how to do it, then. it makes sense to think about how to split up the work into digestible, parallelize-able chunks that can be tackled by the various interested parties working on this project. (My team actually wanted to propose a road map and attach it to the proposal, but there simply wasn't time if we wanted to get the API out before the next IRC meeting in enough time for people to have had a chance to look at it.) Why embark on this process at all if we don't have any real idea of what the end-goal looks like? I hope this will not look offensive if I say that the 'end goal' was discussed at Grizzly, Havana and Icehouse summit and even before. While not every requirement was discussed on the summits, there was quite clear understanding of the dependencies between features. And I don't mean that 'roadmap is fixed' or anything like that, I'm just saying that thinking about the end-goal is not something we've started to do 1.5 months ago. So speaking about 'holistic view', please tell, how L7 and SSL are related, does one depend on another or affect another? Right-- so paraphrasing what I think you're saying: Let's consider how we're going to tackle the SSL and L7 problems at a later date, once we've fixed the object model to be compatible with how we're going to tackle SSL and L7. This implies you've already considered how to tackle SSL and L7 in making your object model change proposal! That was mostly about L7, but for sure subteam has considered and designed L7, and I also assume you've seen those design docs made by Sam: on your diagrams I see the same objects and relationship as in Samuel's doc. My point is: Unless you have already considered how to do SSL and L7, then any object model change proposal is essentially pointless. Why make these changes unless we already know we're going to do SSL and L7 in some way that is compatible with the proposed changes? Evaluating these things *necessarily* must happen at the same time (ie. holistically) or the object model proposal is pointless and unjustified. Actually implementing the design will happen first with the core object model changes, and then the SSL and L7 features. ... But I think I've already made my point that not considering things like SSL and L7 in any proposal means there's no clear indication that core object model changes will be compatible with what needs to happen for SSL and L7.
Re: [openstack-dev] [Neutron][LBaaS] BBG edit of new API proposal
Hi Stephen, Thanks for the great document. As I promised, I'll try to make a few action items out if it. First of all, I'd like to say that the API you have proposed is very close to what is proposed in the blueprint https://review.openstack.org/#/c/89903/with several differences I'd like to address here and make them action items. So, first of all, I think that API described in the doc seems to account for all cases we had in mind, i didn't check on case-by-cases basis, it's just a first glance impression looking at REST resource set and their attributes. General idea of the whole API/obj model improvement is that we create a baseline for all advanced features/usecases that we have in the roadmap. Which means that those features then can be added incrementally. Incrementally means that resources or attributes might be added, but workflow remains and backward compatibility is preserved. That was not the case with multiple listeners and L7. So, a couple on general comments: 1. Whole discussion about API/obj model improvement had the goal of allowing multiple pools and multiple listeners. For that purpose loadbalancer instance might be an extra. The good thing about 'loadbalancer' is that it can be introduced in the API in incremental way. So, VIP+listeners itself is already quite flexible construct (where VIP is a root object playing loadbalancer role) that addresses our immediate needs. So I'd like to extract loadbalancer API and corresponding use cases in another blueprint. You know that loadbalancer concept has raised very heated discussions so it makes sense to continue discussing it separately, keeping in mind that introducing loadbalancer is not very complex and it may be done on top of the VIPs/listeners API 2. SSL-related objects. SSL is rather big deal, both from API and object model, it was a separate blueprint in Icehouse and i think it makes sense to work separately on it. What I mean is that ssl don't affect core API (VIPs/listeners/pools) other than adding some attributes to listeners. 3. L7 is also a separate work, it will not be accounted in 'API improvement' blueprint. You can sync with Samuel for this as we already have pretty detailed blueprints on that. 4. Attribute differences in REST resources. This falls into two categories: - existing attributes that should belong to one or another resource, - attributes that should be added (e.g. they didn't exist in current API) The first class is better to be addressed in the blueprint review. The second class could be a small action items/blueprints or even bugs. Example: 1) custom_503 - that attribute deserves it's own miniblueprint, I'd keep it out of scope of 'API improvement' work. 2) ipv6_subnet_id/addressv6 - that IMO also deserves it's own miniblueprint (whole thing about ipv6 support) So, I'd like to make the following action items out of the document: 1. Extract 'core API' - VIPs/Listeners/Pools/Members/Healthmonitors. This action item is actually the blueprint that I've filed and that's what I'm going to implement 2. Work on defining single-call API that goes along with single object core API (above) Your document already does a very good job on this front. 3. Extract 'Loadbalancer' portion of API into additional Doc/blueprint. I deserves it's own discussion and use cases. I think separating it will also help to reduce discussion contention. 4. Work with https://review.openstack.org/#/c/89903/ to define proper attribute placement of existing attributes 5. Define a set of attributes that are missing in proposed API and make a list of work items based on that. (I assume that there also could be some, that may make sense to include in proposal) I think following this list will actually help us to make iterative progress and also to work on items in parallel. Thanks again for the great document! Eugene. On Thu, Apr 24, 2014 at 4:07 AM, Stephen Balukoff sbaluk...@bluebox.netwrote: Hi Brandon! Thanks for the questions. Responses inline: On Wed, Apr 23, 2014 at 2:51 PM, Brandon Logan brandon.lo...@rackspace.com wrote: Hey Stephen! Thanks for the proposal and spending time on it (I know it is a large time investment). This is actually very similar in structure to something I had started on except a load balancer object was the root and it had a one-to-many relationship to VIPs and each VIP had a one-to-many relationship to listeners. We decided to scrap that because it became a bit complicated and the concept of sharing VIPs across load balancers (single port and protocol this time), accomplished the same thing but with a more streamlined API. The multiple VIPs having multiple listeners was the main complexity and your proposal does not have that either. Anyway, some comments and questions on your proposal are listed below. Most are minor quibbles, questions and suggestions that can probably be fleshed out later when we decide on one proposal and I am going to use your object names as
Re: [openstack-dev] [Neutron] Flavor(?) Framework
In addition, I prefer to managing flavor/type through API and decoupling flavor/type definition from provider definitions in configuration files as Cinder and Nova do. Yes, that's the current proposal. Thanks, Eugene. On Fri, Apr 25, 2014 at 5:41 PM, Akihiro Motoki mot...@da.jp.nec.comwrote: Hi, I have a same question from Mark. Why is flavor not desired? My first vote is flavor first, and then type. There is similar cases in other OpenStack projects. Nova uses flavor and Cinder uses (volume) type for similar cases. Both cases are similar to our use cases and I think it is better to use either of them to avoid more confusion from naming for usesr and operators. Cinder volume_type detail is available at [1]. In Cinder volume_type, we can define multiple volume_type for one driver. (more precisely, volume_type is associated to one backend defintion and we can define multiple backend definition for one backend driver). In addition, I prefer to managing flavor/type through API and decoupling flavor/type definition from provider definitions in configuration files as Cinder and Nova do. [1] http://docs.openstack.org/admin-guide-cloud/content/multi_backend.html Thanks, Akihiro (2014/04/24 0:05), Eugene Nikanorov wrote: Hi neutrons, A quick question of the ^^^ I heard from many of you that a term 'flavor' is undesirable, but so far there were no suggestions for the notion that we are going to introduce. So please, suggest you name for the resource. Names that I've been thinking of: - Capability group - Service Offering Thoughts? Thanks, Eugene. ___ OpenStack-dev mailing listOpenStack-dev@lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Flavor(?) Framework
I'd also recommend simplifying the API and CLI by removing the implementation-focused provider type stuff eventually, as well, since a service type framework would essentially make that no longer needed -- at least on the public API side of things. Correct, that's the part of the proposed change, although we probably need to support it for one more release. Eugene. On Fri, Apr 25, 2014 at 9:40 PM, Jay Pipes jaypi...@gmail.com wrote: On Fri, 2014-04-25 at 13:32 -0400, Mohammad Banikazemi wrote: As I understand the proposed flavor framework, the intention is to provide a mechanism for specifying different flavors of a given service type as they are already defined. So using the term type may be confusing. Here we want to specify possibly different set of capabilities within a given defined service type. Hi Mohammed, Yes, the trouble in Neutron is the existing service type usage... I proposed to rename that to service family or service class in a previous email, and use a type for each service class, so: load balancer type firewall type VPN type I'd also recommend simplifying the API and CLI by removing the implementation-focused provider type stuff eventually, as well, since a service type framework would essentially make that no longer needed -- at least on the public API side of things. Best, -jay Inactive hide details for Jay Pipes ---04/25/2014 12:09:43 PM---On Fri, 2014-04-25 at 13:41 +, Akihiro Motoki wrote: Hi,Jay Pipes ---04/25/2014 12:09:43 PM---On Fri, 2014-04-25 at 13:41 +, Akihiro Motoki wrote: Hi, From: Jay Pipes jaypi...@gmail.com To: openstack-dev@lists.openstack.org, Date: 04/25/2014 12:09 PM Subject: Re: [openstack-dev] [Neutron] Flavor(?) Framework __ On Fri, 2014-04-25 at 13:41 +, Akihiro Motoki wrote: Hi, I have a same question from Mark. Why is flavor not desired? My first vote is flavor first, and then type. Some reasons: First, flavor, in English, can and often is spelled differently depending on where you live in the world (flavor vs. flavour). Second, type is the appropriate term for what this is describing, and doesn't have connotations of taste, which flavor does. I could also mention that the term flavor is a vestige of the Rackspace Cloud API and, IMO, should be killed off in place of the more common and better understood instance type which is used by the EC2 API. There is similar cases in other OpenStack projects. Nova uses flavor and Cinder uses (volume) type for similar cases. Both cases are similar to our use cases and I think it is better to use either of them to avoid more confusion from naming for usesr and operators. Cinder volume_type detail is available at [1]. In Cinder volume_type, we can define multiple volume_type for one driver. (more precisely, volume_type is associated to one backend defintion and we can define multiple backend definition for one backend driver). In addition, I prefer to managing flavor/type through API and decoupling flavor/type definition from provider definitions in configuration files as Cinder and Nova do. Yes, I don't believe there's any disagreement on that particular point. This effort is all about trying to provide a more comfortable and reasonable way for classification of these advanced services to be controlled by the user. Best, -jay [1] http://docs.openstack.org/admin-guide-cloud/content/multi_backend.html Thanks, Akihiro (2014/04/24 0:05), Eugene Nikanorov wrote: Hi neutrons, A quick question of the ^^^ I heard from many of you that a term 'flavor' is undesirable, but so far there were no suggestions for the notion that we are going to introduce. So please, suggest you name for the resource. Names that I've been thinking of: - Capability group - Service Offering Thoughts? Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http
Re: [openstack-dev] [Neutron] Flavor(?) Framework
Marrios, I'm working on that right now. Expect the BP to be on gerrit later today. Thanks, Eugene. On Thu, Apr 24, 2014 at 3:40 PM, mar...@redhat.com mandr...@redhat.comwrote: On 24/04/14 13:54, Eugene Nikanorov wrote: Marios, here's the link with proposal description: https://wiki.openstack.org/wiki/Neutron/FlavorFramework thanks very much (sorry should have found that easily); I'm wondering if the existing blueprint @[1] and (information from) wiki can be combined into a 'new' gerrit based neutron-specs review. I can do that if you would appreciate the help (unless you're already on it). Might even be easier to hash out the naming there. thanks! marios [1] https://blueprints.launchpad.net/neutron/+spec/neutron-flavor-framework http://git.openstack.org/cgit/openstack/neutron-specs Mark: personally I find name 'flavor' suitable because it's the same concept as nova flavor. So I'll use it in BP/code unless something better come up. Thanks, Eugene. On Thu, Apr 24, 2014 at 11:09 AM, mar...@redhat.com mandr...@redhat.com wrote: On 23/04/14 18:05, Eugene Nikanorov wrote: Hi neutrons, A quick question of the ^^^ I heard from many of you that a term 'flavor' is undesirable, but so far there were no suggestions for the notion that we are going to introduce. So please, suggest you name for the resource. Names that I've been thinking of: - Capability group - Service Offering Thoughts? Eugene, my apologies but I am lacking context here - is there a discussion/wiki page that may give some background here (what are we trying to name? A specific configuration of a neutron deployment? thanks, marios Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron] Flavor(?) Framework
Hi neutrons, A quick question of the ^^^ I heard from many of you that a term 'flavor' is undesirable, but so far there were no suggestions for the notion that we are going to introduce. So please, suggest you name for the resource. Names that I've been thinking of: - Capability group - Service Offering Thoughts? Thanks, Eugene. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Flavor(?) Framework
Thanks, that can be an option. Just wondering, can we find a single name? Thanks, Eugene. On Wed, Apr 23, 2014 at 7:19 PM, Jay Pipes jaypi...@gmail.com wrote: On Wed, 2014-04-23 at 19:05 +0400, Eugene Nikanorov wrote: Hi neutrons, A quick question of the ^^^ I heard from many of you that a term 'flavor' is undesirable, but so far there were no suggestions for the notion that we are going to introduce. So please, suggest you name for the resource. Names that I've been thinking of: - Capability group - Service Offering Load balancer type VPN type Firewall type Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev