Re: [openstack-dev] [nova][vmware] Convert to rescue by adding the rescue image and booting from it
Definitely +1. This change will allow us to get rid of the of the ugly special cases where we use the instance name instead of the uuid and will make the code cleaner. Let's go for it! Thanks, Rado - Original Message - Currently we create a rescue instance by creating a new VM with the original instance's image, then adding the original instance's first disk to it, and booting. This means we have 2 VMs, which we need to be careful of when cleaning up. Also when suspending, and probably other edge cases. We also don't support: * Rescue images other than the instance's creation image * Rescue of an instance which wasn't created from an image * Access to cinder volumes from a rescue instance I've created a dirty hack which, instead of creating a new VM, attaches the given rescue image to the VM and boots from it: https://review.openstack.org/#/c/106078/ It works for me. It supports all of the above, doesn't require special handling on destroy, and works with suspend[1]. It also doesn't trigger the spurious warning message about unknown VMs on the cluster which, while unimportant in itself, is an example of an edge case caused by having 2 VMs. Does this seem a reasonable way to go? It would be dependent on a refactoring of the image cache code so we could cache the rescue image. Matt [1] If suspend of a rescued image wasn't broken at the api level, anyway. I have a patch for that: https://review.openstack.org/#/c/106082/ -- Matthew Booth Red Hat Engineering, Virtualisation Team Phone: +442070094448 (UK) GPG ID: D33C3490 GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Fuel] Neutron ML2 Blueprints
Retested today ubuntu single nova vlan - works centos single nova dhcp - works ubuntu single neutron gre - works centos single neutron vlan - works centos ha(1) neutron vlan - fail haproxy issue ubuntu ha(1) neutron gre - fail haproxy issue. haproxy / vip issue: due to whatever reason that I haven't been able to track down yet, the ip netns namespaces (public and management) ns_IPaddr2 vips can not ping or otherwise communicate with nodes remote to who ever owns the respective vip. Once this issue is resolved, I believe that CI should pass given that the build appears 100% functional except that computes cant connect to the vip properly. On Thu, Jul 10, 2014 at 1:05 AM, Mike Scherbakov mscherba...@mirantis.com wrote: We had a call between Andrew (@xarses), Vladimir Kuklin (@aglarendil) and myself (@mihgen) today to finally sort out Neutron ML2 integration in Fuel. We didn't have @xenolog on the call, but hopefully he is more or less fine with all below, and kindly request him to confirm :) Discussed following topics, with an agreement from all participants: 1. Risks of merging https://review.openstack.org/#/c/103280 (@xarses, upstream puppet module, will further refer as 280) vs https://review.openstack.org/#/c/103947 (@xenolog, extending existing puppet module with ML2 support, will further refer as 947) - We all agree, that 280 is strategically the way to go. It was so by design, and 947 was done only as risk mitigation plan in case if 280 is not ready in time - Both 280 and 947 were manually verified in combinations of ubuntu/centos/vlan/gre/ha, 280 needs to be verified with nova-network - 947 was ready a week ago and considered to be more stable solution - 280 is has much higher risks to introduce regressions, as it is basically rearchitecturing of Neutron puppet module in Fuel - 947 has backward compatibility support with legacy OVS, while 280 doesn't have it at the moment 2. Mellanox VMWare NSX dependency on ML2 implementation rebase time - Rebase itself should not be hard - It has to be tested and may take up to next WE to do all rebases/testing/fixing - As both Mellanox NSX Neutron pieces are isolated, it can be an exception for merging by next TH 3. Discussed sanitize_network_config [1] - @aglarendil points out that we need to verify input params which puppet receives in advance, before waiting hours of deploy - @mihgen's point of view is that we need to consider each Fuel component as module, and verify output of it with certain input params. So there is no point to verify input in puppet, if it's being verified in output of Nailgun. - @xarses says that we need to verify configuration files created in system after module execution, to check if it matches with module input params - Overall topic has been discussed much more extensively, and it needs further follow up. @aglarendil confirms that it is Ok to remove sanitize for now, but start writing much more tests to check that 280 deploys correctly right away Action plan we've come up with: 1. Merge 947, as less risky at the moment, also considering the fact that 280 doesn't pass CI - this will immediately unblock Mellanox NSX teams so they can rebase on ML2 implementation 2. In parallel, create a test plan for 280 together with QA team 3. @xenolog to join 280's effort, start fixing issues discovered there, including [2] 4. Build an ISO, based on 280, and start testing it according to the plan 5. Merge 280 once test plan is executed and issues fixed. Expected on Monday. [1] https://review.openstack.org/#/c/99807/1/specs/5.1/ml2-neutron.rst [2] https://fuel-jenkins.mirantis.com/job/master_fuellib_review_systest_ubuntu/1608/ Thanks, On Wed, Jul 9, 2014 at 10:31 PM, Andrew Woodward xar...@gmail.com wrote: Teams from MElanox and VMware based their work on current implementation of Neutron Mlnx appears to not set any neutron settings, so wont be enabled correctly with out some more TLC ( https://wiki.openstack.org/wiki/Mellanox-Neutron-Icehouse-Redhat#Neutron_Server_Node ) NSX appears to be using legacy OVS, but assumes just whatever the default core_plugin is so it will need some TLC too. - to merge https://review.openstack.org/#/c/103280 in fuel-5.1 Link is wrong should be https://review.openstack.org/#/c/103947 - To base Neutron implementation for 6.0 get https://review.openstack.org/#/c/103280/ and start working for adapt this for Juno Neutron. I still think we should merge it - Also we should discuss how to make HA-wrapper more pluggable. https://review.openstack.org/#/c/103279/ makes it very pluggable, and felt like the best start, what else do we need to add in the near term? Should we just extend it
Re: [openstack-dev] [heat] autoscaling across regions and availability zones
Zane Bitter zbit...@redhat.com wrote on 07/10/2014 05:57:14 PM: On 09/07/14 22:38, Mike Spreitzer wrote: Zane Bitter zbit...@redhat.com wrote on 07/01/2014 06:54:58 PM: On 01/07/14 16:23, Mike Spreitzer wrote: ... Hmm, now that I think about it, CloudFormation provides a Fn::GetAZs function that returns a list of available AZs. That suggests an implementation where you can specify an AZ If we're whittling down then it would be one or more AZs, right? when creating the stack and the function returns only that value within that stack (and its children). There's no way in OS::Heat::AutoScalingGroup to specify an intrinsic function that is resolved in the context of the scaling group's nested stack, I am not sure I understand what you mean. Is it: there is no way for the implementation of a resource type to create or modify an intrinsic function? but if the default value of the AZ for OS::Nova::Server were calculated the same way then the user would have the option of omitting the AZ (to allow the autoscaling implementation to control it) I am not sure I get this part. If the scaling group member type is a Compute instance (as well as if it is not) then the template generated by the group (to implement the group) wants to put different resources in different AZs. The nested stack that is the scaling group is given a whole list of AZs as its list-of-AZs parameter value. or overriding it explicitly. At that point you don't even need the intrinsic function. So don't assign a stack to a particular AZ as such, but allow the list of valid AZs to be whittled down as you move toward the leaves of the tree of templates. I partially get the suggestion. Let me repeat it back to see if it sounds right. Let the stack create and update operations gain an optional parameter that is a list of AZs, (noting that a stack operation parameter is something different from a parameter specified in a template) constrained to be a subset of the AZs available to the user in Heat's configured region; the default value is the list of all AZs available to the user in Heat's configured region. Redefine the Fn::GetAZs intrinsic to return that new parameter's value. For each resource type that can be given a list of AZs, we (as plugin authors) redefine the default to be the list returned by Fn::GetAZs; for each resource type that can be given a single AZ, we (as plugin authors) redefine the default to be one (which one?) of the AZs returned by Fn::GetAZs. That would probably require some finessing around the schema technology, because a parameter's default value is fixed when the resource type is registered, right? A template generated by scaling group code somehow uses that new stack operation parameter to set the member's AZ when the member is a stack and the scaling group is spanning AZs. Would the new stack operation parameter (list of AZs) be reflected as a property of OS::Heat::Stack? How would that list be passed in a scenario like https://review.openstack.org/#/c/97366/10/hot/asg_of_stacks.yaml,unified where the member type is a template filename and the member's properties are simply the stack's parameters? Can the redefinitions mentioned here be a backward compatibility problem? So yes, the tricky part is how to handle that when the scaling unit is not a server (or a provider template with the same interface as a server). One solution would have been to require that the scaled unit was, indeed, either an OS::Nova::Server or a provider template with the same interface as (or a superset of) an OS::Nova::Server, but the consensus was against that. (Another odd consequence of this decision is that we'll potentially be overwriting an AZ specified in the launch config section with one from the list supplied to the scaling group itself.) For provider templates, we could insert a pseudo-parameter containing the availability zone. I think that could be marginally better than taking over one of the user's parameters, but you're basically on the right track IMO. I considered a built-in function or pseudo parameter and rejected them based on a design principle that was articulated in an earlier discussion: no modes. Making the innermost template explicitly declare that it takes an AZ parameter makes it more explicit what is going on. But I agree that this is a relatively minor design point, and would be content to go with a pseudo-parameter if the community really prefers that. Unfortunately, that is not the end of the story, because we still have to deal with other types of resources being scaled. I always advocated for an autoscaling resource where the scaled unit was either a provider stack (if you provided a template) or an OS::Nova::Server (if you didn't), but the
Re: [openstack-dev] [all] Treating notifications as a contract
tl;dr: Having a minimum payload standard enables more and more robust services on both sides of notification bus. Yes, that's exactly the point of this thread. We do not have a standard format for the payload. I think we should (more on that below). Again, such standardization is exactly the point of this thread. But I guess you're suggesting not only that we version/schematize individual notification payloads, but that we do so in a way that's global across event types and emitters? Well, the notifications emitted are intended to be general-purpose in nature. The general purpose is that they are notifications of some info, somewhat like described above. Well, yeah. If the payload had a guaranteed base form with known structure and keys then consumption becomes easier. The structure doesn't have to limit what _can_ be there, it should describe what must be there to be adequate: if there are multiple samples, how is that sequence represented (a list); in each sample what are the required fields (name, time, unit, volume to guess at a few); in the parent framing what are the required fields (e.g source, type). I just started the code for processing of notifications from Ironic. Conceptually they are the same as notifications from Nova but the actual form of the payload is completely different. This means I have to write a different processor for that payload. And now so does StackTach if they want to handle it. So for the purposes of grounding the discussion, can you give an example of what the Ironic notification payload might look like in a perfect world? (Just a link to a paste of a hand-crafted example would suffice) Yes, this moves the onus of creating well-formed metrics to notifying systems, but that is good: It is providing exactly the sort of clean and easy to test contract at the boundaries that leads to good neighbors and easy testing. OK, so the thing to note is that the set of potential consumers is not limited to services with a metrics-oriented view of the world. So for example it's been proposed, or is actually a reality, that at least the following different types of beasts are interested in consuming notifications: * services with a metrics-oriented PoV (e.g. Ceilometer) * services with an events-oriented PoV (e.g. StackTach) * services with a UX-oriented PoV (e.g. Horizon) * services with an entitlement-enforcement PoV (e.g. Katello) I'm not sure that moving the onus of creating well-formed metrics to notifying systems is feasible, when many of the actual and potential consumers of these data don't actually have a strictly metrics-oriented perspective. Problem is, the requirements of Ceilometer, StackTack, and indeed NewShinyMeteringShiz, are probably gonna be different. Is that really a problem or do we fear it is? Don't they all want a thing that has the same basic form and info (an event-like thingie with a value). Well I think it is a real problem, given the breath of consumers as described above. But in an case, I'm bit confused by where this line of reasoning is headed. We seem to have gone from the notifying service directly emitting well-formed samples that ceilometer can just directly persist, to more generic event-like thingies. Have I misread that point about *directly* emitting samples? Having a standard notification payload format would of course mean change, but we know that flexible metering/auditing is very important for the OpenStack universe. Your argument seems to be that having such a standard, predisposed to ceilometer, would limit flexibility and lose capability. Yep, that is exactly my point. Predisposition of the notification format to ceilometer's needs is what concerns me. As opposed to the notion of standardization/schematization/versioning which is the explicit goal of the discussion. Take my example from above, the processing of Ironic notifications. I think it is weird that I had to write code for that. Does it not seem odd to you? OK, so have we strayed into an orthogonal concern here? i.e. not that ceilometer requires some translation to be done, but that this translation must be hand-craft in Python code as opposed to being driven declaratively via some configured mapping rules? Cheers, Eoghan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Asyncio and oslo.messaging
On 07/10/2014 06:46 PM, Mark McLoughlin wrote: On Thu, 2014-07-03 at 16:27 +0100, Mark McLoughlin wrote: Hey This is an attempt to summarize a really useful discussion that Victor, Flavio and I have been having today. At the bottom are some background links - basically what I have open in my browser right now thinking through all of this. We're attempting to take baby-steps towards moving completely from eventlet to asyncio/trollius. The thinking is for Ceilometer to be the first victim. I got a little behind on this thread, but maybe it'd be helpful to summarize some things from this good discussion: Thanks for summarizing the thread up. - Is moving to asyncio really a priority compared to other things? I think Victor has made a good case on what's wrong with eventlet?[1] and, personally, I'm excited about the prospect of the Python community more generally converging on asyncio. Understanding what OpenStack would need in order move to asyncio will help the asyncio effort more generally. Figuring through some of this stuff is a priority for Victor and others, but no-one is saying it's an immediate priority for the whole project. Agreed. Lets not underestimate the contributions OpenStack as a community has done to Python and the fact that it can/should keep doing them. Experimenting with asyncio will bring to light things that can be contributed back to the community and it'll also help creating new scenarios and use-cases around asyncio. - Moving from an implicitly async to an explicitly async programming has enormous implications and we need to figure out what it means for libraries like SQLalchemy and abstraction layers like ORMs. I think that's well understood - the topic of this thread is merely how to make a small addition to oslo.messaging (the ability to dispatch asyncio co-routines on eventlet) so that we can move on to figuring out the next piece of puzzle. Lets take 1 step at a time. oslo.messaging is a core piece of OpenStack but it's also a library that can be used outside OpenStack. Having support for explicit async in oslo.messaging is a good thing for the library itself regardless of whether it'll be adopted throughout OpenStack in the long run. - Taskflow vs asyncio - good discussion, plenty to figure out. They're mostly orthogonal concerns IMHO but *maybe* we decide adopting both makes sense and that both should be adopted together. I'd like to see more concrete examples showing taskflow vs asyncio vs taskflow/asyncio to understand better. +1 So, tl;dr is that lots of work remains to even begin to understand how exactly asyncio could be adopted and whether that makes sense. The thread raises some interesting viewpoints, but I don't think it moves our understanding along all that much. The initial mail was simply about unlocking one very small piece of the puzzle. Agreed. I'm happy to help moving this effort forward and gather some real-life results onto which we can base future plans and decisions. Flavio. -- @flaper87 Flavio Percoco ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] fastest way to run individual tests ?
On 07/09/2014 10:51 PM, Matt Riedemann wrote: On 6/12/2014 6:17 AM, Daniel P. Berrange wrote: On Thu, Jun 12, 2014 at 07:07:37AM -0400, Sean Dague wrote: On 06/12/2014 06:59 AM, Daniel P. Berrange wrote: Does anyone have any tip on how to actually run individual tests in an efficient manner. ie something that adds no more than 1 second penalty over above the time to run the test itself. NB, assume that i've primed the virtual env with all prerequisite deps already. The overhead is in the fact that we have to discover the world, then throw out the world. You can actually run an individual test via invoking the testtools.run directly: python -m testtools.run nova.tests.test_versions (Also, when testr explodes because of an import error this is about the only way to debug what's going on). Most excellent, thankyou. I knew someone must know a way to do it :-) Regards, Daniel I've been beating my head against the wall a bit on unit tests too this week, and here is another tip that just uncovered something for me when python -m testtools.run and nosetests didn't help. I sourced the tox virtualenv and then ran the test from there, which gave me the actual error, so something like this: source .tox/py27/bin/activate python -m testtools.run test Props to Matt Odden for helping me with the source of the venv tip. FWIW - this is what ./run_tests.sh -d does but also prepends the lockutils invocation Vish mentioned that is needed for some tests to run. I've also noticed several bugs when running functional tests this way, especially those that start services and parse config options when we don't load the whole world, which no one seems to be fixing, so I assumed that usage of this was not very widespread (I assume bcz ppl used tox). N. PS. Now that I think about it - I've also not submitted fixes for them... :) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [keystone/swift] role-based access cotrol in swift
John, Thank you for your quick response. On Friday, July 11, 2014 12:33 PM John Dickinson m...@not.mn wrote: Some of the above may be in line with what you're looking for. They are the one what I'm looking for. First I will look at the codes of policy engine whether I can use it. Thanks again, Hisashi Oasnai ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
I just started the code for processing of notifications from Ironic. Conceptually they are the same as notifications from Nova but the actual form of the payload is completely different. This means I have to write a different processor for that payload. And now so does StackTach if they want to handle it. The data format that Ironic will send was part of the spec proposed and could have been reviewed. I think there's still time to change it tho, if you have a better format talk to Haomeng which is the guys responsible for that work in Ironic and see if he can change it (We can put up a following patch to fix the spec with the new format as well) . But we need to do this ASAP because we want to get it landed in Ironic soon. Cheers, Lucas ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
On Fri, 11 Jul 2014, Eoghan Glynn wrote: But I guess you're suggesting not only that we version/schematize individual notification payloads, but that we do so in a way that's global across event types and emitters? That's partially correct. I'm suggesting the we consider standardizing a general format for notification payloads across emitters. _Not_ individual notification payloads. Schematizing individual payloads strikes me as heavyweight[1] when it _might_[2] be easier to declare that a list of dicts with certain required fields is sufficient to represent a multiplicity of value- oriented events which are consumed as any of metric, event, visualization data-point, etc. Elsewhere there's been some discussion of ActivityStreams as a possible model. It may be, they have the same basic idea: stuff that happens can be represented by a sequence of relatively stupid event- like things. The intial versions of ActivityStreams was based on atom+xml but later people realized this would be a ton easier if it was some well known and well formed JSON. Take that line of evolution a little bit further and you've got a list of dicts with certain required fields. (more below) [1] A useful turn of phrase learned in this message: http://lists.openstack.org/pipermail/openstack-dev/2014-July/039941.html [2] Emphasis on the might, meaning might be worthy of consideration at least as a strawman to help illuminate concerns and benefits. So for the purposes of grounding the discussion, can you give an example of what the Ironic notification payload might look like in a perfect world? I can't give you a perfect example because I haven't had an opportunity to understand all the needs and issues but a rough quickie to try to illustrate the point is here: http://paste.openstack.org/show/86071/ The main difference there is that the value and the unit are provided in known fields at a known descent into the data structure, rather than needing to be extracted from strings that are values at custom keys. I'd rather we not get hung up on the details of the representation as that's the way rabbits go. If the concept has merit the representation can follow. Yes, this moves the onus of creating well-formed metrics to notifying systems, but that is good: It is providing exactly the sort of clean and easy to test contract at the boundaries that leads to good neighbors and easy testing. OK, so the thing to note is that the set of potential consumers is not limited to services with a metrics-oriented view of the world. I'm of the opinion that it _may_ be possible that metrics-oriented is a subclass of something more generic that might be representable. The CADF folk seem to think so (although I'm not suggesting we go down that road, c.f. heavyweight). * services with a metrics-oriented PoV (e.g. Ceilometer) * services with an events-oriented PoV (e.g. StackTach) * services with a UX-oriented PoV (e.g. Horizon) * services with an entitlement-enforcement PoV (e.g. Katello) These all sound like there are in at least the same ballbark. I think the common term would be events and a metric is a type of event? But in an case, I'm bit confused by where this line of reasoning is headed. We seem to have gone from the notifying service directly emitting well-formed samples that ceilometer can just directly persist, to more generic event-like thingies. Have I misread that point about *directly* emitting samples? No, you are seeing the evolution of my thinking: as I write about this stuff and try to understand it, it becomes more clear (to me) that samples ought to be treated as event-like thingies. Having a standard notification payload format would of course mean change, but we know that flexible metering/auditing is very important for the OpenStack universe. Your argument seems to be that having such a standard, predisposed to ceilometer, would limit flexibility and lose capability. Yep, that is exactly my point. Predisposition of the notification format to ceilometer's needs is what concerns me. As opposed to the notion of standardization/schematization/versioning which is the explicit goal of the discussion. Okay, and my proposal is not to have a standard predisposed to ceilometer. It is to make a limited standard for a large class of notifications and _not_ schematize individual notification types (more on this aspect below) and then change ceilometer so it can use the plan. Take my example from above, the processing of Ironic notifications. I think it is weird that I had to write code for that. Does it not seem odd to you? OK, so have we strayed into an orthogonal concern here? I don't think so, it was supposed to be an example of the cost of there not being an existing general standard. If there were such a standard I wouldn't have had to write any code, only the Ironic folk would and I would have had the free time to help them. Less code == good! Similarly if there are individual schema for the
Re: [openstack-dev] [all] Treating notifications as a contract
On Fri, 11 Jul 2014, Lucas Alvares Gomes wrote: The data format that Ironic will send was part of the spec proposed and could have been reviewed. I think there's still time to change it tho, if you have a better format talk to Haomeng which is the guys responsible for that work in Ironic and see if he can change it (We can put up a following patch to fix the spec with the new format as well) . But we need to do this ASAP because we want to get it landed in Ironic soon. It was only after doing the work that I realized how it might be an example for the sake of this discussion. As the architecure of Ceilometer currently exist there still needs to be some measure of custom code, even if the notifications are as I described them. However, if we want to take this opportunity to move some of the smarts from Ceilomer into the Ironic code then the paste that I created might be a guide to make it possible: http://paste.openstack.org/show/86071/ However on that however, if there's some chance that a large change could happen, it might be better to wait, I don't know. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][ML2] Support dpdk ovs with ml2 plugin
Can you explain whats the use case for running both ovs and userspace ovs on the same host? Thanks Przemek From: loy wolfe [mailto:loywo...@gmail.com] Sent: Friday, July 11, 2014 3:17 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Neutron][ML2] Support dpdk ovs with ml2 plugin +1 It's totally different between ovs and userspace ovs. also, there is strong need to keep ovs even we have a userspace ovs in the same host -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] request to tag novaclient 2.18.0
On 11/07/14 02:04, Michael Still wrote: Sorry for the delay here. This email got lost in my inbox while I was travelling. This release is now tagged. Additionally, I have created a milestone for this release in launchpad, which is the keystone process for client releases. This means that users of launchpad can now see what release a given bug was fixed in, and improves our general launchpad bug hygiene. However, because we haven't done this before, this first release is a bit bigger than it should me. I'm having some pain marking the milestone as released in launchpad, but I am arguing with launchpad about that now. Michael Cough, this broke horizon stable and master; heat stable is affected as well. For Horizon, I filed bug https://bugs.launchpad.net/horizon/+bug/1340596 Matthias ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] request to tag novaclient 2.18.0
Matthias Runge wrote: On 11/07/14 02:04, Michael Still wrote: Sorry for the delay here. This email got lost in my inbox while I was travelling. This release is now tagged. Additionally, I have created a milestone for this release in launchpad, which is the keystone process for client releases. This means that users of launchpad can now see what release a given bug was fixed in, and improves our general launchpad bug hygiene. However, because we haven't done this before, this first release is a bit bigger than it should me. I'm having some pain marking the milestone as released in launchpad, but I am arguing with launchpad about that now. Michael Cough, this broke horizon stable and master; heat stable is affected as well. For Horizon, I filed bug https://bugs.launchpad.net/horizon/+bug/1340596 The same bug (https://bugs.launchpad.net/bugs/1340596) will be used to track Heat tasks as well. -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][ML2] Support dpdk ovs with ml2 plugin
A simple usecase could be to have a compute node able start VM with optimized net I/O or standard net I/O, depending on the network flavor ordered for this VM. On Fri, Jul 11, 2014 at 11:16 AM, Czesnowicz, Przemyslaw przemyslaw.czesnow...@intel.com wrote: Can you explain whats the use case for running both ovs and userspace ovs on the same host? Thanks Przemek From: loy wolfe [mailto:loywo...@gmail.com] Sent: Friday, July 11, 2014 3:17 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Neutron][ML2] Support dpdk ovs with ml2 plugin +1 It's totally different between ovs and userspace ovs. also, there is strong need to keep ovs even we have a userspace ovs in the same host -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Asyncio and oslo.messaging
Hi, Le lundi 7 juillet 2014, 19:18:38 Mark McLoughlin a écrit : I'd expect us to add e.g. @asyncio.coroutine def call_async(self, ctxt, method, **kwargs): ... to RPCClient. Perhaps we'd need to add an AsyncRPCClient in a separate module and only add the method there - I don't have a good sense of it yet. I don't want to make trollius a mandatory dependency of Oslo Messaging, at least not right now. An option is to only declare the method if trollius is installed. try: import trollius except ImportError: trollius = None and then if trollius is not None: @trollius.coroutine def cal_async(): Or maybe a different module (maybe using a subclass) is better. Victor ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Infra] Jenkins gate jobs fails
hi, Clark, I have tried recheck several times and got jenkins passed. Thanks for the explanation. On Fri, Jul 11, 2014 at 1:29 PM, Clark Boylan clark.boy...@gmail.com wrote: On Thu, Jul 10, 2014 at 10:12 PM, stanzgy stan@gmail.com wrote: Several jenkins gate jobs failed since some lib packages in ubuntu source are missing and devstack failed to setup the tempest env. Could there be someone help to fix this? I have filed the bug here: https://bugs.launchpad.net/openstack-ci/+bug/1340514 2014-07-11 02:35:42.333 | + apt_get install qemu-kvm 2014-07-11 02:35:42.336 | + sudo DEBIAN_FRONTEND=noninteractive http_proxy= https_proxy= no_proxy= apt-get --option Dpkg::Options::=--force-confold --assume-yes install qemu-kvm 2014-07-11 02:35:42.358 | Reading package lists... 2014-07-11 02:35:42.602 | Building dependency tree... 2014-07-11 02:35:42.604 | Reading state information... 2014-07-11 02:35:42.769 | The following packages were automatically installed and are no longer required: 2014-07-11 02:35:42.769 | python-colorama python-distlib python-html5lib 2014-07-11 02:35:42.769 | Use 'apt-get autoremove' to remove them. 2014-07-11 02:35:42.796 | The following extra packages will be installed: 2014-07-11 02:35:42.796 | cpu-checker ipxe-qemu libbluetooth3 libbrlapi0.6 libcaca0 libfdt1 2014-07-11 02:35:42.796 | libsdl1.2debian libseccomp2 libspice-server1 libusbredirparser1 libxen-4.4 2014-07-11 02:35:42.796 | libxenstore3.0 libyajl2 msr-tools qemu-keymaps qemu-system-common 2014-07-11 02:35:42.796 | qemu-system-x86 seabios 2014-07-11 02:35:42.797 | Suggested packages: 2014-07-11 02:35:42.797 | samba vde2 sgabios 2014-07-11 02:35:42.798 | The following NEW packages will be installed: 2014-07-11 02:35:42.798 | cpu-checker ipxe-qemu libbluetooth3 libbrlapi0.6 libcaca0 libfdt1 2014-07-11 02:35:42.798 | libsdl1.2debian libseccomp2 libspice-server1 libusbredirparser1 libxen-4.4 2014-07-11 02:35:42.798 | libxenstore3.0 libyajl2 msr-tools qemu-keymaps qemu-kvm qemu-system-common 2014-07-11 02:35:42.798 | qemu-system-x86 seabios 2014-07-11 02:35:42.851 | 0 upgraded, 19 newly installed, 0 to remove and 0 not upgraded. 2014-07-11 02:35:42.851 | Need to get 291 kB/3985 kB of archives. 2014-07-11 02:35:42.851 | After this operation, 20.4 MB of additional disk space will be used. 2014-07-11 02:35:42.851 | Err http://mirror.rackspace.com/ubuntu/ trusty-security/main libxenstore3.0 amd64 4.4.0-0ubuntu5.1 2014-07-11 02:35:42.851 | 404 Not Found 2014-07-11 02:35:42.855 | Err http://mirror.rackspace.com/ubuntu/ trusty-security/main libxen-4.4 amd64 4.4.0-0ubuntu5.1 2014-07-11 02:35:42.855 | 404 Not Found 2014-07-11 02:35:42.858 | E: Failed to fetch http://mirror.rackspace.com/ubuntu/pool/main/x/xen/libxenstore3.0_4.4.0-0ubuntu5.1_amd64.deb 404 Not Found 2014-07-11 02:35:42.858 | 2014-07-11 02:35:42.858 | E: Failed to fetch http://mirror.rackspace.com/ubuntu/pool/main/x/xen/libxen-4.4_4.4.0-0ubuntu5.1_amd64.deb 404 Not Found 2014-07-11 02:35:42.858 | 2014-07-11 02:35:42.858 | E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing? -- Best Regards ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev This bug is a duplicate of https://bugs.launchpad.net/bugs/1286818 I have gone ahead and marked it that way. The issue here is the rackspace mirrors periodically go sideways and don't work properly. We think this is because they are not syncing from ubuntu safely. There are a couple options to fix this. We can run our own ubuntu, centos, and fedora mirrors. There has been some work to get this going, https://review.openstack.org/#/c/89928/1 and https://review.openstack.org/#/c/90875/. pleia2 and dprince should know more. Rackspace could also correct their mirror syncing. Or we could possibly point a different mirrors entirely but that probably won't be any better than rackspace in the long run due to the Internet being unreliable. The first option gives us the most control and ability to react if things break. Feel free to review and/or update those changes as appropriate. Clark ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best Regards, Gengyuan Zhang NetEase Inc. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Asyncio and oslo.messaging
On Thu, Jul 10, 2014 at 11:51 PM, Outlook harlo...@outlook.com wrote: On Jul 10, 2014, at 3:48 AM, Yuriy Taraday yorik@gmail.com wrote: On Wed, Jul 9, 2014 at 7:39 PM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Yuriy Taraday's message of 2014-07-09 03:36:00 -0700: On Tue, Jul 8, 2014 at 11:31 PM, Joshua Harlow harlo...@yahoo-inc.com wrote: I think clints response was likely better than what I can write here, but I'll add-on a few things, How do you write such code using taskflow? @asyncio.coroutine def foo(self): result = yield from some_async_op(...) return do_stuff(result) The idea (at a very high level) is that users don't write this; What users do write is a workflow, maybe the following (pseudocode): # Define the pieces of your workflow. TaskA(): def execute(): # Do whatever some_async_op did here. def revert(): # If execute had any side-effects undo them here. TaskFoo(): ... # Compose them together flow = linear_flow.Flow(my-stuff).add(TaskA(my-task-a), TaskFoo(my-foo)) I wouldn't consider this composition very user-friendly. So just to make this understandable, the above is a declarative structure of the work to be done. I'm pretty sure it's general agreed[1] in the programming world that when declarative structures can be used they should be (imho openstack should also follow the same pattern more than it currently does). The above is a declaration of the work to be done and the ordering constraints that must be followed. Its just one of X ways to do this (feel free to contribute other variations of these 'patterns' @ https://github.com/openstack/taskflow/tree/master/taskflow/patterns). [1] http://latentflip.com/imperative-vs-declarative/ (and many many others). I totally agree that declarative approach is better for workflow declarations. I'm just saying that we can do it in Python with coroutines instead. Note that declarative approach can lead to reinvention of entirely new language and these flow.add can be the first step on this road. I find it extremely user friendly when I consider that it gives you clear lines of delineation between the way it should work and what to do when it breaks. So does plain Python. But for plain Python you don't have to explicitly use graph terminology to describe the process. I'm not sure where in the above you saw graph terminology. All I see there is a declaration of a pattern that explicitly says run things 1 after the other (linearly). As long as workflow is linear there's no difference on whether it's declared with .add() or with yield from. I'm talking about more complex workflows like one I described in example. # Submit the workflow to an engine, let the engine do the work to execute it (and transfer any state between tasks as needed). The idea here is that when things like this are declaratively specified the only thing that matters is that the engine respects that declaration; not whether it uses asyncio, eventlet, pigeons, threads, remote workers[1]. It also adds some things that are not (imho) possible with co-routines (in part since they are at such a low level) like stopping the engine after 'my-task-a' runs and shutting off the software, upgrading it, restarting it and then picking back up at 'my-foo'. It's absolutely possible with coroutines and might provide even clearer view of what's going on. Like this: @asyncio.coroutine def my_workflow(ctx, ...): project = yield from ctx.run_task(create_project()) # Hey, we don't want to be linear. How about parallel tasks? volume, network = yield from asyncio.gather( ctx.run_task(create_volume(project)), ctx.run_task(create_network(project)), ) # We can put anything here - why not branch a bit? if create_one_vm: yield from ctx.run_task(create_vm(project, network)) else: # Or even loops - why not? for i in range(network.num_ips()): yield from ctx.run_task(create_vm(project, network)) Sorry but the code above is nothing like the code that Josh shared. When create_network(project) fails, how do we revert its side effects? If we want to resume this flow after reboot, how does that work? I understand that there is a desire to write everything in beautiful python yields, try's, finally's, and excepts. But the reality is that python's stack is lost the moment the process segfaults, power goes out on that PDU, or the admin rolls out a new kernel. We're not saying asyncio vs. taskflow. I've seen that mistake twice already in this thread. Josh and I are suggesting that if there is a movement to think about coroutines, there should also be some time spent thinking at a high level: how do we resume tasks, revert side effects, and control flow? If we
Re: [openstack-dev] [all] Treating notifications as a contract
But I guess you're suggesting not only that we version/schematize individual notification payloads, but that we do so in a way that's global across event types and emitters? That's partially correct. I'm suggesting the we consider standardizing a general format for notification payloads across emitters. _Not_ individual notification payloads. So I'm open to correction, but I don't think anyone suggested using egregiously different formats for different notifications. A notification of compute.instance.create.start is naturally going to carry different types of data than a volume.snapshot.delete.end for example, but of course we'd seek to accommodate that difference within a generic structure as far as possible. Schematizing individual payloads strikes me as heavyweight[1] when it _might_[2] be easier to declare that a list of dicts with certain required fields is sufficient to represent a multiplicity of value- oriented events which are consumed as any of metric, event, visualization data-point, etc. OK, so lightweight schemas are lighter in weight that heavyweight schemas. Yep, agreed. Elsewhere there's been some discussion of ActivityStreams as a possible model. Is that the social media syndication format you're talking about? That's been discussed elsewhere as possible model for openstack notifications? I missed that, can you provide a link? It may be, they have the same basic idea: stuff that happens can be represented by a sequence of relatively stupid event- like things. OK, I might be missing something here, but we seem to have a close approximation of that already: stuff happens == events pop out on the bus So is your point that our events aren't dumb enough? (e.g. encode too much structure, or carry too much data, or require too much interpretation?) The intial versions of ActivityStreams was based on atom+xml but later people realized this would be a ton easier if it was some well known and well formed JSON. Are you suggesting here we adopt atom+json? So for the purposes of grounding the discussion, can you give an example of what the Ironic notification payload might look like in a perfect world? I can't give you a perfect example because I haven't had an opportunity to understand all the needs and issues but a rough quickie to try to illustrate the point is here: http://paste.openstack.org/show/86071/ The main difference there is that the value and the unit are provided in known fields at a known descent into the data structure, rather than needing to be extracted from strings that are values at custom keys. OK, so in this case, can we sum it up with: * provide an ID field * parse out the Sensor Reading into value and units, dropping the delta * push the concept of cumulative versus delta versus gauge into the producer * dump the stuff that ceilometer is not so interested into an 'extras; dict So, cool, we've made our job a little easier on the ceilometer side. Has the notification lost any expressive power? Not much by the looks of it. Is it a good idea to drop for example the error range on the floor because ceilometer doesn't know what to do with it? Dunno, that would depend on whether (a) the delta value reported by IPMI is actually meaningful and (b) some other consumer can do something smart with it. I don't know enough about IPMI to comment on (a). My instinct would be to leave the door open for (b). But overall, it doesn't look like too radical a change in format. I'd rather we not get hung up on the details of the representation as that's the way rabbits go. If the concept has merit the representation can follow. Yes, this moves the onus of creating well-formed metrics to notifying systems, but that is good: It is providing exactly the sort of clean and easy to test contract at the boundaries that leads to good neighbors and easy testing. OK, so the thing to note is that the set of potential consumers is not limited to services with a metrics-oriented view of the world. I'm of the opinion that it _may_ be possible that metrics-oriented is a subclass of something more generic that might be representable. The CADF folk seem to think so (although I'm not suggesting we go down that road, c.f. heavyweight). And again, unless I'm missing something, that's pretty much what I was saying. Re-using your phraseology: creating well-formed metrics is not necessarily what we want the producer to do, because metrics-oriented is a subclass of something more generic * services with a metrics-oriented PoV (e.g. Ceilometer) * services with an events-oriented PoV (e.g. StackTach) * services with a UX-oriented PoV (e.g. Horizon) * services with an entitlement-enforcement PoV (e.g. Katello) These all sound like there are in at least the same ballbark. I think the common term would be events and a metric is a type of event? Yes. Yes. And thrice yes! :) The
Re: [openstack-dev] [Nova] [Gantt] Scheduler split status (updated)
On 10 July 2014 16:59, Sylvain Bauza sba...@redhat.com wrote: Le 10/07/2014 15:47, Russell Bryant a écrit : On 07/10/2014 05:06 AM, Sylvain Bauza wrote: Hi all, === tl;dr: Now that we agree on waiting for the split prereqs to be done, we debate on if ResourceTracker should be part of the scheduler code and consequently Scheduler should expose ResourceTracker APIs so that Nova wouldn't own compute nodes resources. I'm proposing to first come with RT as Nova resource in Juno and move ResourceTracker in Scheduler for K, so we at least merge some patches by Juno. === Some debates occured recently about the scheduler split, so I think it's important to loop back with you all to see where we are and what are the discussions. Again, feel free to express your opinions, they are welcome. Where did this resource tracker discussion come up? Do you have any references that I can read to catch up on it? I would like to see more detail on the proposal for what should stay in Nova vs. be moved. What is the interface between Nova and the scheduler here? Oh, missed the most important question you asked. So, about the interface in between scheduler and Nova, the original agreed proposal is in the spec https://review.openstack.org/82133 (approved) where the Scheduler exposes : - select_destinations() : for querying the scheduler to provide candidates - update_resource_stats() : for updating the scheduler internal state (ie. HostState) Here, update_resource_stats() is called by the ResourceTracker, see the implementations (in review) https://review.openstack.org/82778 and https://review.openstack.org/104556. The alternative that has just been raised this week is to provide a new interface where ComputeNode claims for resources and frees these resources, so that all the resources are fully owned by the Scheduler. An initial PoC has been raised here https://review.openstack.org/103598 but I tried to see what would be a ResourceTracker proxified by a Scheduler client here : https://review.openstack.org/105747. As the spec hasn't been written, the names of the interfaces are not properly defined but I made a proposal as : - select_destinations() : same as above - usage_claim() : claim a resource amount - usage_update() : update a resource amount - usage_drop(): frees the resource amount Again, this is a dummy proposal, a spec has to written if we consider moving the RT. While I am not against moving the resource tracker, I feel we could move this to Gantt after the core scheduling has been moved. I was imagining the extensible resource tracker to become (sort of) equivalent to cinder volume drivers. Also the persistent resource claims will give us another plugin point for gantt. That might not be enough, but I think it easier to see once the other elements have moved. But the key point thing I like, is how the current approach amounts to refactoring, similar to the cinder move. I feel we should stick to that if possible. John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][vmware] Convert to rescue by adding the rescue image and booting from it
On 10 July 2014 16:52, Matthew Booth mbo...@redhat.com wrote: Currently we create a rescue instance by creating a new VM with the original instance's image, then adding the original instance's first disk to it, and booting. This means we have 2 VMs, which we need to be careful of when cleaning up. Also when suspending, and probably other edge cases. We also don't support: * Rescue images other than the instance's creation image * Rescue of an instance which wasn't created from an image * Access to cinder volumes from a rescue instance I've created a dirty hack which, instead of creating a new VM, attaches the given rescue image to the VM and boots from it: https://review.openstack.org/#/c/106078/ I do worry about different drivers having such radically different implementation approaches. Currently rescue only attaches the root disk to the rescue image. Having a separate VM does side step having to work out where to reattach all the disks when you boot up the original VM, as you haven't modified that. But there are plans to change that here: http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/juno/rescue-attach-all-disks.rst You can now specify and image when you go into rescue mode: http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/juno/allow-image-to-be-specified-during-rescue.rst I guess the rescue image could technically change how the VM boots, or what hardware it has attached, so you might end up making so many tweaks to the original VM that you might want to just create a new VM, then through way those changes when you restore the original VM. It feels a lot like we need to better understand the use cases for this feature, and work out what we need in the long term. Does this seem a reasonable way to go? Maybe, but I am not totally keen on making it such a different implementation to all the other drivers. Mostly for the sake of people who might run two hypervisors in their cloud, or people who support customers running various hypervisors. John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][vmware] Convert to rescue by adding the rescue image and booting from it
On Fri, Jul 11, 2014 at 12:30:19PM +0100, John Garbutt wrote: On 10 July 2014 16:52, Matthew Booth mbo...@redhat.com wrote: Currently we create a rescue instance by creating a new VM with the original instance's image, then adding the original instance's first disk to it, and booting. This means we have 2 VMs, which we need to be careful of when cleaning up. Also when suspending, and probably other edge cases. We also don't support: * Rescue images other than the instance's creation image * Rescue of an instance which wasn't created from an image * Access to cinder volumes from a rescue instance I've created a dirty hack which, instead of creating a new VM, attaches the given rescue image to the VM and boots from it: https://review.openstack.org/#/c/106078/ I do worry about different drivers having such radically different implementation approaches. Currently rescue only attaches the root disk to the rescue image. Having a separate VM does side step having to work out where to reattach all the disks when you boot up the original VM, as you haven't modified that. But there are plans to change that here: http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/juno/rescue-attach-all-disks.rst You can now specify and image when you go into rescue mode: http://git.openstack.org/cgit/openstack/nova-specs/tree/specs/juno/allow-image-to-be-specified-during-rescue.rst I guess the rescue image could technically change how the VM boots, or what hardware it has attached, so you might end up making so many tweaks to the original VM that you might want to just create a new VM, then through way those changes when you restore the original VM. It feels a lot like we need to better understand the use cases for this feature, and work out what we need in the long term. Does this seem a reasonable way to go? Maybe, but I am not totally keen on making it such a different implementation to all the other drivers. Mostly for the sake of people who might run two hypervisors in their cloud, or people who support customers running various hypervisors. My view is that rescue mode should have as few differences from normal mode as possible. Ideally the exact same VM configuration would be used, with the exception that you add in one extra disk and set the BIOS to boot of that new disk. The spec you mention above gets us closer to that in libvirt, but it still has the problem that it re-shuffles the disk order. To fix this I think we need to change the rescue image disk so that isntead of being a virtio-blk or IDE disk, it is a hotplugged USB disk and make the BIOS boot from this USB disk. That way none of the existing disk attachments will change in any way. This would also feel more like the way a physical machine would be rescued where you would typically insert a bootable CDROM or a rescue USB stick So in that sense I think that Matt suggests for VMWare is good because it gets the vmware driver moving in the right direction. I'd encourage them to also follow that libvirt blueprint and ensure all disks are attached. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] request to tag novaclient 2.18.0
On 2014-07-11 11:21:19 +0200 (+0200), Matthias Runge wrote: this broke horizon stable and master; heat stable is affected as well. [...] I guess this is a plea for applying something like the oslotest framework to client libraries so they get backward-compat jobs run against unit tests of all dependant/consuming software... branchless tempest already alleviates some of this, but not the case of changes in a library which will break unit/functional tests of another project. -- Jeremy Stanley ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 09/07/14 13:17, Ihar Hrachyshka wrote: Hi all, Multiple projects are suffering from db lock timeouts due to deadlocks deep in mysqldb library that we use to interact with mysql servers. In essence, the problem is due to missing eventlet support in mysqldb module, meaning when a db lock is encountered, the library does not yield to the next green thread, allowing other threads to eventually unlock the grabbed lock, and instead it just blocks the main thread, that eventually raises timeout exception (OperationalError). The failed operation is not retried, leaving failing request not served. In Nova, there is a special retry mechanism for deadlocks, though I think it's more a hack than a proper fix. Neutron is one of the projects that suffer from those timeout errors a lot. Partly it's due to lack of discipline in how we do nested calls in l3_db and ml2_plugin code, but that's not something to change in foreseeable future, so we need to find another solution that is applicable for Juno. Ideally, the solution should be applicable for Icehouse too to allow distributors to resolve existing deadlocks without waiting for Juno. We've had several discussions and attempts to introduce a solution to the problem. Thanks to oslo.db guys, we now have more or less clear view on the cause of the failures and how to easily fix them. The solution is to switch mysqldb to something eventlet aware. The best candidate is probably MySQL Connector module that is an official MySQL client for Python and that shows some (preliminary) good results in terms of performance. I've made additional testing, creating 2000 networks in parallel (10 thread workers) for both drivers and comparing results. With mysqldb: 215.81 sec With mysql-connector: 88.66 ~2.4 times performance boost, ok? ;) I think we should switch to that library *even* if we forget about all the nasty deadlocks we experience now. I've posted a Neutron spec for the switch to the new client in Juno at [1]. Ideally, switch is just a matter of several fixes to oslo.db that would enable full support for the new driver already supported by SQLAlchemy, plus 'connection' string modified in service configuration files, plus documentation updates to refer to the new official way to configure services for MySQL. The database code won't, ideally, require any major changes, though some adaptation for the new client library may be needed. That said, Neutron does not seem to require any changes, though it was revealed that there are some alembic migration rules in Keystone or Glance that need (trivial) modifications. You can see how trivial the switch can be achieved for a service based on example for Neutron [2]. While this is a Neutron specific proposal, there is an obvious wish to switch to the new library globally throughout all the projects, to reduce devops burden, among other things. My vision is that, ideally, we switch all projects to the new library in Juno, though we still may leave several projects for K in case any issues arise, similar to the way projects switched to oslo.messaging during two cycles instead of one. Though looking at how easy Neutron can be switched to the new library, I wouldn't expect any issues that would postpone the switch till K. It was mentioned in comments to the spec proposal that there were some discussions at the latest summit around possible switch in context of Nova that revealed some concerns, though they do not seem to be documented anywhere. So if you know anything about it, please comment. So, we'd like to hear from other projects what's your take on that move, whether you see any issues or have concerns about it. Thanks for your comments, /Ihar [1]: https://review.openstack.org/#/c/104905/ [2]: https://review.openstack.org/#/c/105209/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCgAGBQJTv9LHAAoJEC5aWaUY1u57d2cIAIAthLuM6qxN9fVjPwoICEae oSOLvaDNPpZ+xBBqKI+2l5aFiBXSkHzgCfWGHEZB4e+5odAzt8r3Dg5eG/hwckGt iZLPGLxcmvD5K0cRoSSPWkPC4KkOwKw0yQHl/JQarDcHQlLgO64jx3bzlB1LDxRu R/Bvqo1SBo8g/cupWyxJXNViu9z7zAlvcHLRg4j/AfNTsTDZRrSgbMF2/gLTMvN2 FPtkjBvZq++zOva5G5/TySr1b3QRBFCG0uetVbcVF//90XOw+O++rUiDW1v7vkA9 OS2sCIXmx1i8kt9yuvs0h11MS8qfX9rSXREJXyPq6NDmePdQdKFsozMdTmqaDfU= =JfiC -END PGP SIGNATURE- ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
On 7/10/2014 12:10 PM, Chris Dent wrote: On Thu, 10 Jul 2014, Julien Danjou wrote: My initial plan was to leverage a library like voluptuous to do schema based validation on the sender side. That would allow for receiver to introspect schema and know the data structure to expect. I didn't think deeply on how to handle versioning, but that should be doable too. It's not clear to me in this discussion what it is that is being versioned, contracted or standardized. Is it each of the many different notifications that various services produce now? Is it the general concept of a notification which can be considered a sample that something like Ceilometer or StackTack might like to consume? The only real differences between a sample and an event are: 1. the size of the context. Host X CPU = 70% tells you nearly everything you need to know. But compute.scheduler.host_selected will require lots of information to tell you why and how host X was selected. The event payload should be atomic and not depend on previous events for context. With samples, the context is sort of implied by the key or queue name. 2. The handling of Samples can be sloppy. If you miss a CPU sample, just wait for the next one. But if you drop an Event, a billing report is going to be wrong or a dependent system loses sync. 3. There are a *lot* more samples emitted than events. Samples are a shotgun blast while events are registered mail. This is why samples don't usually have the schema problems of events. They are so tiny, there's not much to change. Putting a lot of metadata in a sample is generally a bad idea. Leave it to the queue or key name. That said, Monasca is doing some really cool stuff with high-speed sample processing such that the likelihood of dropping a sample is so low that event support should be able to come from the same framework. The difference is simply the size of the payload and if the system can handle it at volume (quickly and reliably). ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [cinder][replication-api] Questions about #64026
Hi Ronen, hello everybody else, now that I'm trying to write a DRBD implementation for the Replication API (https://review.openstack.org/#/c/64026/) a few questions pop up. As requested by Ronan I'll put them here on -dev, so that the questions (and, hopefully, the answers ;) can be easily found. To provide a bit of separation I'll do one question per mail, and each in a subthread of this mail. Thanks for the patience, I'm looking forward to hearing your helpful ideas! Regards, Phil -- : Ing. Philipp Marek : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com : DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [cinder][replication-api] extra_specs too constant
I think that extra_specs in the database is too static, too hard to change. In the case of eg. DRBD, where many nodes may provide some storage space, the list replication_partners is likely to change often, even if only newly added nodes have to be done[1] This means that a) the admin has to add each node manually b) volume_type_extra_specs:value is a VARCHAR(255), which can only provide a few host names. (With FQDN even more so.) What if the list of hosts would be matched by each one saying I'm product XYZ version compat N-M (eg. via get_volume_stats), and all nodes that report the same product with an overlapping version range are considered eligible for replication? Furthermore, replication_rpo_range might depend on other circumstances too... if the network connection to the second site is heavily loaded, the RPO will vary, too - from a few seconds to a few hours. So, should we announce a range of (0,7200)? Ad 1: because Openstack sees by itself which nodes are available. -- : Ing. Philipp Marek : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com : DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [cinder][replication-api] replication_rpo_range - why two values?
replication_rpo_range currently gets set with two values - a lower and an upper bound. File cinder/scheduler/filter_scheduler.py:118 has if target_rpo rpo_range[0] or target_rpo rpo_range[1]: Why do we check for target_rpo rpo_range[1]? Don't use that one if replication is too fast? Because we'd like to revert to an older state? I believe that using snapshots would be more sane for that use case. Or I just don't understand the reason, which is very likely, too. -- : Ing. Philipp Marek : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com : DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Treating notifications as a contract
On Fri, 11 Jul 2014, Eoghan Glynn wrote: A notification of compute.instance.create.start is naturally going to carry different types of data than a volume.snapshot.delete.end for example, but of course we'd seek to accommodate that difference within a generic structure as far as possible. Is it going to carry different types of data or different values on identical keys. The goal would be that the meaning of a notification for compute.instance.create.start and volume.snapshot.delete.end should be found on the same set of keys. That's all the proposal is. Elsewhere there's been some discussion of ActivityStreams as a possible model. Is that the social media syndication format you're talking about? That's been discussed elsewhere as possible model for openstack notifications? I missed that, can you provide a link? I'm making something of a leap from http://lists.openstack.org/pipermail/openstack-dev/2014-July/039965.html which links to https://wiki.openstack.org/wiki/NotificationSystem which mentions PUSH, which, in the time since that NotificationSystem document was written has evolved to be the transport for ActivityStreams, where activities are events. It may be, they have the same basic idea: stuff that happens can be represented by a sequence of relatively stupid event- like things. OK, I might be missing something here, but we seem to have a close approximation of that already: stuff happens == events pop out on the bus So is your point that our events aren't dumb enough? Events are too different in from, from event to event. (e.g. encode too much structure, or carry too much data, or require too much interpretation?) Of these three, require too much interpretation is perhaps the best match. The intial versions of ActivityStreams was based on atom+xml but later people realized this would be a ton easier if it was some well known and well formed JSON. Are you suggesting here we adopt atom+json? No, I'm just charting the path from a complex over-schematized format (atom+xml) through to one which is easier to consume (atom+json) to one which is even easier to consume (a well known dictionary). Has the notification lost any expressive power? Not much by the looks of it. Is it a good idea to drop for example the error range on the floor because ceilometer doesn't know what to do with it? The error range isn't being dropped: it would be in the extras because it is of its own special meaning. If there's agreement that error_range is a useful general key, then that would be a part of the standard somewhere. Dunno, that would depend on whether (a) the delta value reported by IPMI is actually meaningful and (b) some other consumer can do something smart with it. As I worried before, this concrete example seems to have diverted you to focussing on the details of what is a complete strawman made in a few minutes. Whether the delta value is dropped is an unknown at this time. It might be that it could make sense for the producer to emit multiple events where it now emits one. I don't really think that's that important to the discussion, right now. I don't know enough about IPMI to comment on (a). My instinct would be to leave the door open for (b). I don't reckon the door is being shut in any way. But overall, it doesn't look like too radical a change in format. No, it's not. What it is is a minor change that everyone could adopt and thus be able to play in the party. The metric-oriented view is subset/specialized aspect of something more generic. Okay, so we agree on that. Can we agree that it might be possible to represent that generic thing with a common grammar? No, you are seeing the evolution of my thinking: as I write about this stuff and try to understand it, it becomes more clear (to me) that samples ought to be treated as event-like thingies. OK, so that shift makes it a little difficult to follow and engage with. This is a mailing list for a distributed project, it is the sole place where we can get some reasonable simulation of having a chat with a wide body of interested parties. The point of having a chat is to explore, think, and exchange and hone ideas. I certainly hope you don't expect me and everyone else only to write to the list when we have a fully researched proposal. This (apart from summits and mid-cycles) is the most accessible place we have to which we can go to create the information that allows us to do the compare and contrast the eventually leads to creating proposals. OK, so perhaps I misread your earlier comments on samples and assumed a predisposition to ceilometer's requirements was being proposed? It seems that way. Throughout I've been saying notifications should be easy for anyone to create and anyone to consume and have confidence they are doing it right. Can you start fleshing out exactly what you mean by a standard not necessarily predisposed to ceilometer, still sufficiently close to eliminate the need
[openstack-dev] [Heat] [TripleO] Extended get_attr support for ResourceGroup
Hi all, This is a follow-up to Clint Byrum's suggestion to add the `Map` intrinsic function[0], Zane Bitter's response[1] and Randall Burt's addendum[2]. Sorry for bringing it up again, but I'd love to reach consensus on this. The summary of the previous conversation: 1. TripleO is using some functionality currently not supported by Heat around scaled-out resources 2. Clint proposed a `map` intrinsic function that would solve it 3. Zane said Heat have historically been against a for-loop functionality 4. Randall suggested ResourceGroup's attribute passthrough may do what we need I've looked at the ResourceGroup code and experimented a bit. It does do some of what TripleO needs but not all. Here's what we're doing with our scaled-out resources (what we'd like to wrap in a ResourceGroup or similar in the future): 1. Building a coma-separated list of RabbitMQ nodes: https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L642 This one is easy with ResourceGroup's inner attribute support: list_join: - , - {get_attr: [controller_group, name]} (controller_group is a ResourceGroup of Nova servers) 2. Get the name of the first Controller node: https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L339 Possible today: {get_attr: [controller_group, resource.0.name]} 3. List of IP addresses of all controllers: https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L405 We cannot do this, because resource group doesn't support extended attributes. Would need something like: {get_attr: [controller_group, networks, ctlplane, 0]} (ctlplane is the network controller_group servers are on) 4. IP address of the first node in the resource group: https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/swift-deploy.yaml#L29 Can't do: extended attributes are not supported for the n-th node for the group either. This can be solved by `get_resource` working with resource IDs: get_attr: - {get_attr: [controller_group, resource.0]} - [networks, ctlplane, 0] (i.e. we get the server's ID from the ResourceGroup and change `get_attr` to work with the ID's too. Would also work if `get_resource` understood IDs). Alternatively, we could extend the ResourceGroup's get_attr behaviour: {get_attr: [controller_group, resource.0.networks.ctlplane.0]} but the former is a bit cleaner and more generic. --- That was the easy stuff, where we can get by with the current functionality (plus a few fixes). What follows are examples that really need new intrinsic functions (or seriously complicating the ResourceGroup attribute code and syntax). 5. Building a list of {ip: ..., name: ...} dictionaries to configure haproxy: https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L478 This really calls for a mapping/for-each kind of functionality. Trying to invent a ResourceGroup syntax for this would be perverse. Here's what it could look like under Clint's `map` proposal: map: - ip: {get_attr: [{get_resource: $1}, networks, ctlplane, 0] name: {get_attr: [{get_resource: $1}, name]} - {get_attr: [compute_group, refs]} (this relies on `get_resource` working with resource IDs. Alternatively, we could have a `resources` attribute for ResourceGroup that returns objects that can be used with get_attr. 6. Building the /etc/hosts file https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L585 Same as above, but also joining two lists together. We can use nested {list_join: [\n, [...]} just as we're doing now, but having a `concat_list` function would make this and some other cases shorter and clearer. 7. Building the list of Swift devices: https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/swift-deploy.yaml#L23 In addition to the abowe, we're adding a single element at the beginning of a list. Asking for a `cons` support is pushing it, right? ;-) We could just wrap that in a list and use `concat_list` or keep using nested `list_join`s as in the /etc/hosts case. So this boils down to 4 features proposals: 1. Support extended attributes in ResourceGroup's members 2. Allow a way to use a Resource ID (e.g. what you get by {get_attr: [ResourceGroup, refs]} or {get_attr: [ResourceGroup, resource.0]}) with existing intrinsic functions (get_resource, get_attr) 3. A `map` intrinsic function that turns a list of items to another list by doing operations on each item 4. A `concat_list` intrinsic function that joins multiple lists into one. I think the first two are not controversial. What about the other two? I've shown you some examples where we would
Re: [openstack-dev] Python 2.6 being dropped in K? What does that entail?
On Fri, Jul 11, 2014 at 08:45:07AM -0500, Matt Riedemann wrote: I'm hearing that python 2.6 will no longer be support in the K release but not sure if there is an official statement about that somewhere (wiki?). I realize this means turning off the 2.6 unit test jobs, but what other runtime things are going to be explicitly removed, or if not removed just not blocked which are not compatible with 2.6? Sounds like dict comprehension for one, but a lot of other stuff I thought we were moving to six anyway for supporting python 3? I'm not as concerned about unit tests with 2.6 since I think a lot of development happens against 2.7, but thinking more for distro support like RHEL 6.5 vs RHEL 7, which would mean upgrading to RHEL 7 if you want K. FYI, these days RHEL has a notion of software collections so you can get access to newer supported versions of python, even for RHEL-6 if people really desperately want to stick on that version. I suspect that by the time K is released though, the vast majority will be happy to use RHEL-7 for all the new features it enables. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Containers] Nova virt driver requirements
We consider mounting untrusted filesystems on the host kernel to be an unacceptable security risk. A user can craft a malicious filesystem that expliots bugs in the kernel filesystem drivers. This is particularly bad if you allow the kernel to probe for filesystem type since Linux has many many many filesystem drivers most of which are likely not audited enough to be considered safe against malicious data. Even the mainstream ext4 driver had a crasher bug present for many years https://lwn.net/Articles/538898/ http://libguestfs.org/guestfs.3.html#security-of-mounting-filesystems Actually, there's a hidden assumption here that makes this statement not necessarily correct for containers. You're assuming the container has to have raw access to the device it's mounting. I believe it does in the context of the Cinder API, but it does not in the general context of mounting devices. I advocate having a filesystem-as-a-service or host-mount-API which nicely aligns with desires to mount devices on behalf of containers on the host. However, it doesn't exclude the fact that there are APIs and services those contract is, explicitly, to provide block into guests. I'll reiterate again and say that is where the contract should end (it should not extend to the ability of guest operating systems to mount, that would be silly). None of this excludes having an opinion that mounting inside of a guest is a *useful feature*, even if I don't believe it to be a contractually obligated one. There is probably no harm in contemplating what mounting inside of a guest would look like. For hypervisors, this is true, but it doesn't have to be for containers because the mount operation is separate from raw read and write so we can allow or deny them granularly. I have been considering allowing containers read-only view of a block device. We could use seccomp to allow the mount syscall to succeed inside a container, although it would be forbidden by a missing SYS_CAP_ADMIN capability. The syscall would instead be trapped and performed by a privileged process elsewhere on the host. The read-only view of the block device should not itself be a security concern. In fact, it could prove to be a useful feature in its own right. It is the ability to write to the block device which is a risk should it be mounted. Having that read-only view also provides a certain awareness to the container of the existence of that volume. It allows the container to ATTEMPT to perform a mount operation, even if its denied by policy. That, of course, is where seccomp would come into play... -- Regards, Eric Windisch ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [barbican] Nominating Nathan Reller for barbican-core
+1 for me. Jarret From: Douglas Mendizabal douglas.mendiza...@rackspace.com Reply-To: OpenStack List openstack-dev@lists.openstack.org Date: Thursday, July 10, 2014 at 12:11 PM To: OpenStack List openstack-dev@lists.openstack.org, Nate Reller rellerrel...@yahoo.com Subject: [openstack-dev] [barbican] Nominating Nathan Reller for barbican-core Hi Everyone, I would also like to nominate Nathan Reller for the barbican-core team. Nathan has been involved with the Key Management effort since early 2013. Recently, Nate has been driving the development of a KMIP backend for Barbican, which will enable Barbican to be used with KMIP devices. Nate¹s input to the design of the plug-in mechanisms in Barbican has been extremely helpful, as well as his feedback in CR reviews. As a reminder to barbican-core members, we use the voting process outlined in https://wiki.openstack.org/wiki/Barbican/CoreTeam to add members to our team. Thanks, Doug Douglas Mendizábal IRC: redrobot PGP Key: 245C 7B6F 70E9 D8F3 F5D5 0CC9 AD14 1F30 2D58 923C smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [barbican] Nominating Ade Lee for barbican-core
+1 for me as well. Jarret From: Douglas Mendizabal douglas.mendiza...@rackspace.com Reply-To: OpenStack List openstack-dev@lists.openstack.org Date: Thursday, July 10, 2014 at 11:55 AM To: OpenStack List openstack-dev@lists.openstack.org, a...@redhat.com a...@redhat.com Subject: [openstack-dev] [barbican] Nominating Ade Lee for barbican-core Hi Everyone, I would like to nominate Ade Lee for the barbican-core team. Ade has been involved in the development of Barbican since January of this year, and he¹s been driving the work to enable DogTag to be used as a back end for Barbican. Ade¹s input to the design of barbican has been invaluable, and his reviews are always helpful, which has earned him the respect of the existing barbican-core team. As a reminder to barbican-core members, we use the voting process outlined in https://wiki.openstack.org/wiki/Barbican/CoreTeam to add members to our team. Thanks, Doug Douglas Mendizábal IRC: redrobot PGP Key: 245C 7B6F 70E9 D8F3 F5D5 0CC9 AD14 1F30 2D58 923C smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [neutron] Spec Proposal Deadline has passed, a note on Spec Approval Deadline
Just a note that yesterday we passed SPD for Neutron. We have a healthy backlog of specs, and I'm working to go through this list and make some final approvals for Juno-3 over the next week. If you've submitted a spec which is in review, please hang tight while myself and the rest of the neutron cores review these. It's likely a good portion of the proposed specs may end up as deferred until K release, given where we're at in the Juno cycle now. Thanks! Kyle ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Gantt] Scheduler split status (updated)
Le 11/07/2014 13:14, John Garbutt a écrit : On 10 July 2014 16:59, Sylvain Bauza sba...@redhat.com wrote: Le 10/07/2014 15:47, Russell Bryant a écrit : On 07/10/2014 05:06 AM, Sylvain Bauza wrote: Hi all, === tl;dr: Now that we agree on waiting for the split prereqs to be done, we debate on if ResourceTracker should be part of the scheduler code and consequently Scheduler should expose ResourceTracker APIs so that Nova wouldn't own compute nodes resources. I'm proposing to first come with RT as Nova resource in Juno and move ResourceTracker in Scheduler for K, so we at least merge some patches by Juno. === Some debates occured recently about the scheduler split, so I think it's important to loop back with you all to see where we are and what are the discussions. Again, feel free to express your opinions, they are welcome. Where did this resource tracker discussion come up? Do you have any references that I can read to catch up on it? I would like to see more detail on the proposal for what should stay in Nova vs. be moved. What is the interface between Nova and the scheduler here? Oh, missed the most important question you asked. So, about the interface in between scheduler and Nova, the original agreed proposal is in the spec https://review.openstack.org/82133 (approved) where the Scheduler exposes : - select_destinations() : for querying the scheduler to provide candidates - update_resource_stats() : for updating the scheduler internal state (ie. HostState) Here, update_resource_stats() is called by the ResourceTracker, see the implementations (in review) https://review.openstack.org/82778 and https://review.openstack.org/104556. The alternative that has just been raised this week is to provide a new interface where ComputeNode claims for resources and frees these resources, so that all the resources are fully owned by the Scheduler. An initial PoC has been raised here https://review.openstack.org/103598 but I tried to see what would be a ResourceTracker proxified by a Scheduler client here : https://review.openstack.org/105747. As the spec hasn't been written, the names of the interfaces are not properly defined but I made a proposal as : - select_destinations() : same as above - usage_claim() : claim a resource amount - usage_update() : update a resource amount - usage_drop(): frees the resource amount Again, this is a dummy proposal, a spec has to written if we consider moving the RT. While I am not against moving the resource tracker, I feel we could move this to Gantt after the core scheduling has been moved. I was imagining the extensible resource tracker to become (sort of) equivalent to cinder volume drivers. Also the persistent resource claims will give us another plugin point for gantt. That might not be enough, but I think it easier to see once the other elements have moved. But the key point thing I like, is how the current approach amounts to refactoring, similar to the cinder move. I feel we should stick to that if possible. John Thanks John for your feedback. I'm +1 with you, we need to go on the way we defined with all the community, create Gantt once the prereqs are done (see my above and first mail for these) and see after if the line is needed to move. I think this discussion should also be interesting if we also take in account the current Cinder and Neutron scheduling needs, so we would say if it's the good direction. Others ? Note: The spec https://review.openstack.org/89893 is not yet approved today, as the Spec approval freeze happened, I would like to discuss with the team if we can have an exception on it so the work could happen by Juno. Thanks, -Sylvain ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client
On 7/9/14, 10:59 AM, Roman Podoliaka wrote: Hi all, Not sure what issues you are talking about, but I just replaced mysql with mysql+mysqlconnector in my db connection string in neutron.conf and neutron-db-manage upgrade head worked like a charm for an empty schema. Ihar, could please elaborate on what changes to oslo.db are needed? (as an oslo.db developer I'm very interested in this part :) ) Thanks, Roman On Wed, Jul 9, 2014 at 5:43 PM, Ihar Hrachyshka ihrac...@redhat.com wrote: On 09/07/14 15:40, Sean Dague wrote: On 07/09/2014 09:00 AM, Roman Podoliaka wrote: Hi Ihar, AFAIU, the switch is a matter of pip install + specifying the correct db URI in the config files. I'm not sure why you are filing a spec in Neutron project. IMHO, this has nothing to do with projects, but rather a purely deployment question. E.g. don't we have PostgreSQL+psycopg2 or MySQL+pymysql deployments of OpenStack right now? I think what you really want is to change the defaults we test in the gate, which is a different problem. Because this is really a *new* driver. As you can see by the attempted run, it doesn't work with alembic given the definitions that neutron has. So it's not like this is currently compatible with OpenStack code. Well, to fix that, you just need to specify raise_on_warnings=False for connection (it's default for mysqldb but not mysql-connector). I've done it in devstack patch for now, but probably it belongs to this is also semi-my fault as mysqlconnector apparently defaults this to False now, but for some reason the SQLAlchemy mysqlconnector dialect is flipping it to True (this dialect was contributed by MySQL-connector's folks, so not sure why the inconsistency, perhaps they changed their minds) oslo.db. Thanks, Roman On Wed, Jul 9, 2014 at 2:17 PM, Ihar Hrachyshka ihrac...@redhat.com wrote: Hi all, Multiple projects are suffering from db lock timeouts due to deadlocks deep in mysqldb library that we use to interact with mysql servers. In essence, the problem is due to missing eventlet support in mysqldb module, meaning when a db lock is encountered, the library does not yield to the next green thread, allowing other threads to eventually unlock the grabbed lock, and instead it just blocks the main thread, that eventually raises timeout exception (OperationalError). The failed operation is not retried, leaving failing request not served. In Nova, there is a special retry mechanism for deadlocks, though I think it's more a hack than a proper fix. Neutron is one of the projects that suffer from those timeout errors a lot. Partly it's due to lack of discipline in how we do nested calls in l3_db and ml2_plugin code, but that's not something to change in foreseeable future, so we need to find another solution that is applicable for Juno. Ideally, the solution should be applicable for Icehouse too to allow distributors to resolve existing deadlocks without waiting for Juno. We've had several discussions and attempts to introduce a solution to the problem. Thanks to oslo.db guys, we now have more or less clear view on the cause of the failures and how to easily fix them. The solution is to switch mysqldb to something eventlet aware. The best candidate is probably MySQL Connector module that is an official MySQL client for Python and that shows some (preliminary) good results in terms of performance. I've posted a Neutron spec for the switch to the new client in Juno at [1]. Ideally, switch is just a matter of several fixes to oslo.db that would enable full support for the new driver already supported by SQLAlchemy, plus 'connection' string modified in service configuration files, plus documentation updates to refer to the new official way to configure services for MySQL. The database code won't, ideally, require any major changes, though some adaptation for the new client library may be needed. That said, Neutron does not seem to require any changes, though it was revealed that there are some alembic migration rules in Keystone or Glance that need (trivial) modifications. You can see how trivial the switch can be achieved for a service based on example for Neutron [2]. While this is a Neutron specific proposal, there is an obvious wish to switch to the new library globally throughout all the projects, to reduce devops burden, among other things. My vision is that, ideally, we switch all projects to the new library in Juno, though we still may leave several projects for K in case any issues arise, similar to the way projects switched to oslo.messaging during two cycles instead of one. Though looking at how easy Neutron can be switched to the new library, I wouldn't expect any issues that would postpone the switch till K. It was mentioned in comments to the spec proposal that there were some discussions at the latest summit
Re: [openstack-dev] About Swift as an object storage gateway, like Cinder in block storage
On Mon, 7 Jul 2014 11:05:40 +0800 童燕群 tyan...@qq.com wrote: The workflow of this middle-ware working with swift may be like this pic: Since you're plugging this into a/c/o nodes, there's no difference between this and Pluggable Back-ends. Note that PBE is already implemented in case of object server, see class DiskFile. Account/Container remainder is here: https://review.openstack.org/47713 Do you have a request from your operations to implement this, or it's a nice-to-have excercise for you? If the former, what specific vendor store you are targeting? -- Pete P.S. Note that Cinder includes a large management component, which Swift lacks by itself. In Cinder you can add new back-ends through Cinder's API and CLI. In Swift, you have to run swift-ring-builder and edit configs. Your blueprint does not address this gap. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [TripleO] New test envs deployed
Hi All, this morning we deployed new testenv images on the rh1 rack for CI, a number of things have changed 1. Each TE now contains 15 nodes this should allow us to deploy more VM's per job (essentially paving the way to allow us to add multiple HA controllers to our overcloud job) 2. Instances now only have 2G of RAM (they had 4G but it should really match local devtest runs) 3. Instances are now i386 (this again matches the devtest default and causes instances to require less RAM) Its too early to know if we broke anything so we should keep an eye on it for a bit. A number of regressions crept into our scripts since the last time we did this, here are the relevant patches (yes we should figure out how to CI this somehow) Regression fixes https://review.openstack.org/#/c/106340/ https://review.openstack.org/#/c/106341/ https://review.openstack.org/#/c/106343/ https://review.openstack.org/#/c/106390/ https://review.openstack.org/#/c/106391/ Extra changes needed https://review.openstack.org/#/c/106358/ (this should be merged ASAP) https://review.openstack.org/#/c/106342/ https://review.openstack.org/#/c/106352/ https://review.openstack.org/#/c/106390/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Where should a test for eventlet and oslo.db interaction go?
Hello all. After discussion in IRC, I agree, that we should take care about interaction with eventlet at least because right now eventlet is the OpenStack production configuration. So we must be sure, that we'll get no issues, when we work with eventlet. So I agree, that it makes a sense - to add a test for eventlet and sqlalchemy interaction to oslo.db. You convinced me :) On Thu, Jul 10, 2014 at 7:05 PM, Mike Bayer mba...@redhat.com wrote: On 7/10/14, 7:47 AM, Sean Dague wrote: Honestly, that seems weird to me. oslo.db is built as a common layer for OpenStack services. eventlet is used by most OpenStack services. There are lots of known issues with eventlet vs. our db access patterns. Knowing that the db layer works in the common OpenStack pattern seems really important. And something that seems to make sense very close to the db code itself. Yeah I am +1 on this, the use of eventlet is very prominent throughout openstack components, and oslo.db is intended as a glue layer between SQLAlchemy and those apps. The patterns that are used with eventlet should be tested at the oslo.db level, as oslo.db is responsible for configuration of the driver and additionally IMO should be taking on a much greater role in establishing transactional patterns which also have an impact on these issues. SQLAlchemy itself never spawns any threads. However, we certainly have a crap-ton of tests that test the connection pool and other concurrency-sensitive areas in the context of many threads being run. It's a critical use case so we test against it. oslo.db should at every turn be attempting to remove redundancy from downstream projects. If ten projects all use eventlet, they shouldn't all have to replicate the same test over and over that should just be upwards of them. -Sean On 07/10/2014 07:42 AM, Victor Sergeyev wrote: Hello Angus! IMO, the simple answer on your question is - tests for eventlet and oslo.db interaction should be in the same place, where eventlet and oslo.db interact. :) A little digression - we suppose, that oslo.db should neither know, nor take care whether target projects use eventlet/gevent/OS threads/multiple processes/callbacks/etc for handling concurrency - oslo.db just can't (and should not) make such decisions for users. For the very same reason SQLAlchemy doesn't do that. Thanks, Victor On Thu, Jul 10, 2014 at 10:55 AM, Angus Lees gusl...@gmail.commailto:gusl...@gmail.com gusl...@gmail.com wrote: We have an issue with neutron (and presumably elsewhere), where mysqldb and eventlet may deadlock, until the mysqldb deadlock timer fires. I believe it's responsible for ~all of these failures: http://logstash.openstack.org/#eyJzZWFyY2giOiJcIkxvY2sgd2FpdCB0aW1lb3V0IGV4Y2VlZGVkOyB0cnkgcmVzdGFydGluZyB0cmFuc2FjdGlvblwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDA0OTcwMzgwMjc0fQ== Now, the fix is one thing and is underway (the current favourite option is just switching to a different mysql client library) - my question here is instead about this test: https://review.openstack.org/#/c/104436/ This test (as written) is against oslo.db and drives eventlet + sqlalchemy to confirm that the current sqlalchemy driver does _not_ have the above deadlock observed with mysqldb. I think it (or some version of it) is an important test, but the oslo.db guys don't want it in their testsuite since they've purged every explicit mention of eventlet. I'm sympathetic to this pov. I think we should have something like this test *somewhere*, at least as long as we're using eventlet frequently. I'm a bit new to openstack, so I'm lost in a maze of testing options. Could some kind member of the TC point to where this test *should* go? -- - Gus ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing listOpenStack-dev@lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing listOpenStack-dev@lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Where should a test for eventlet and oslo.db interaction go?
On 07/10/2014 07:47 AM, Sean Dague wrote: Honestly, that seems weird to me. oslo.db is built as a common layer for OpenStack services. eventlet is used by most OpenStack services. There are lots of known issues with eventlet vs. our db access patterns. Knowing that the db layer works in the common OpenStack pattern seems really important. And something that seems to make sense very close to the db code itself. +1 -jay On 07/10/2014 07:42 AM, Victor Sergeyev wrote: Hello Angus! IMO, the simple answer on your question is - tests for eventlet and oslo.db interaction should be in the same place, where eventlet and oslo.db interact. :) A little digression - we suppose, that oslo.db should neither know, nor take care whether target projects use eventlet/gevent/OS threads/multiple processes/callbacks/etc for handling concurrency - oslo.db just can't (and should not) make such decisions for users. For the very same reason SQLAlchemy doesn't do that. Thanks, Victor On Thu, Jul 10, 2014 at 10:55 AM, Angus Lees gusl...@gmail.com mailto:gusl...@gmail.com wrote: We have an issue with neutron (and presumably elsewhere), where mysqldb and eventlet may deadlock, until the mysqldb deadlock timer fires. I believe it's responsible for ~all of these failures: http://logstash.openstack.org/#eyJzZWFyY2giOiJcIkxvY2sgd2FpdCB0aW1lb3V0IGV4Y2VlZGVkOyB0cnkgcmVzdGFydGluZyB0cmFuc2FjdGlvblwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDA0OTcwMzgwMjc0fQ== Now, the fix is one thing and is underway (the current favourite option is just switching to a different mysql client library) - my question here is instead about this test: https://review.openstack.org/#/c/104436/ This test (as written) is against oslo.db and drives eventlet + sqlalchemy to confirm that the current sqlalchemy driver does _not_ have the above deadlock observed with mysqldb. I think it (or some version of it) is an important test, but the oslo.db guys don't want it in their testsuite since they've purged every explicit mention of eventlet. I'm sympathetic to this pov. I think we should have something like this test *somewhere*, at least as long as we're using eventlet frequently. I'm a bit new to openstack, so I'm lost in a maze of testing options. Could some kind member of the TC point to where this test *should* go? -- - Gus ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Keystone][devstack] Keystone is now gating (Juno and beyond) on Apache + mod_wsgi deployed Keystone
The Keystone team is happy to announce that as of yesterday (July 10th 2014), with the merge of https://review.openstack.org/#/c/100747/ Keystone is now gating on Apache + mod_wsgi based deployment. This also has moved the default for devstack to deploy Keystone under apache. This is in-line with the statement that Apache + mod_wsgi is the recommended deployment for Keystone, as opposed to using “keystone-all”. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [neutron] [third-party] Update on third party CI in Neutron
Since Juno-2 is quickly approaching, I wanted to update everyone on where we're at with regards to third party testing in Neutron. The etherpad here [1] was the original link with status. The link here [2] shows what is expected of Neutron third party CI systems. On the CI status side, I'd like to ask the owners of the following CI systems to attend Monday's third party meeting [3] to discuss the status of their CI systems. These are the ones which appear to be in trouble, aren't running, or have some issues. 1. Cisco 1. Not enough logs being saved. 2. Log retention issues. 2. Citrix Netscaler LBaaS driver 1. I don't think this has a third party CI system running. 3. Embrane (both plugin and LBaaS driver) 1. Logs are tarred up and not viewable in web browser. 2. Inconsistent runs at times. 4. IBM SDN-VE 1. Currently inactive, moving to a new system. 5. One Convergence 1. Very high failure rate for patch runs. 6. OpenDaylight 1. Logs are tarred up and not viewable in web browser 7. PLUMgrid 1. Not saving enough logs 8. Radware 1. Logs are not viewable in browser 9. Tail-F 1. Inconsistent past runs, need updates on status. 10. vArmour FWaaS driver 1. Can't view logs. 2. Inconsistent runs against patches. I'd like to take some time in the Monday meeting to go over the issues these CI systems are having and give the maintainers a chance to discuss this with us. The third party team is hopeful we can spend the energy in the meeting working with CI maintainers who are actively interested in making progress on improving their CI systems. Per my email to the list in June [4], the expectation is that third party CI systems in Neutron are running and following the guidelines set forth by both Neutron and Infra. The weekly meeting is a place to seek help, and we're happy that a large number of third party CI owners and maintainers are using this resource. I'd also like to encourange anyone with a patch for a plugin or driver in Neutron to participate in the third-party meetings going forward as well. This will help to ensure your CI system is running while your patch is being reviewed, and you actively work to sort out issues during the review process to ensure smooth merging of your plugin or driver. Thank you! Kyle [1] https://etherpad.openstack.org/p/ZLp9Ow3tNq [2] https://wiki.openstack.org/wiki/NeutronThirdPartyTesting [3] https://wiki.openstack.org/wiki/Meetings/ThirdParty [4] http://lists.openstack.org/pipermail/openstack-dev/2014-June/037665.html ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Keystone][devstack] Keystone is now gating (Juno and beyond) on Apache + mod_wsgi deployed Keystone
On 07/11/2014 05:43 PM, Morgan Fainberg wrote: The Keystone team is happy to announce that as of yesterday (July 10th 2014), with the merge of https://review.openstack.org/#/c/100747/ Keystone is now gating on Apache + mod_wsgi based deployment. This also has moved the default for devstack to deploy Keystone under apache. This is in-line with the statement that Apache + mod_wsgi is the recommended deployment for Keystone, as opposed to using “keystone-all”. Thanks for the heads up. This is something Marconi's team would love to do in devstack as well. Flavio -- @flaper87 Flavio Percoco ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Neutron permission issue
Hi As a tenant when I try to create a router and associate a gateway with the router as a two step process in Horizon things work fine. Now when I want to do the same thing through a create router API call with request below I get permission denied to create router { router: { name: another_router, admin_state_up: true, external_gateway_info: { network_id: 3c5bcddd-6af9-4e6b-9c3e-c153e521cab8, enable_snat: false} } } The network id in both cases is the same. This does not make sense to me Traceback (most recent call last): File vm-tp.py, line 54, in setUp ext_router = self.net.create_router(CONF.ROUTER_NAME, ext_net['id']) File /Users/akalambu/python_venv/latest_code/pns/network.py, line 121, in create_router router = self.neutron_client.create_router(body) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 101, in with_params ret = self.function(instance, *args, **kwargs) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 398, in create_router return self.post(self.routers_path, body=body) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 1320, in post headers=headers, params=params) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 1243, in do_request self._handle_fault_response(status_code, replybody) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 1211, in _handle_fault_response exception_handler_v20(status_code, des_error_body) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 68, in exception_handler_v20 status_code=status_code) Forbidden: Policy doesn't allow create_router to be performed. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] [third-party] Update on third party CI in Neutron
Thanks Youcef. I didn't see any results or information for it, it would be excellent if you could reply to the thread with the info and also come to the meeting Monday. Kyle On Fri, Jul 11, 2014 at 11:57 AM, Youcef Laribi youcef.lar...@citrix.com wrote: Vijay, You need to reply to this and inform Kyle that we do have a CI system. *From:* Kyle Mestery [mailto:mest...@noironetworks.com] *Sent:* Friday, July 11, 2014 8:57 AM *To:* OpenStack Development Mailing List (not for usage questions) *Subject:* [openstack-dev] [neutron] [third-party] Update on third party CI in Neutron Since Juno-2 is quickly approaching, I wanted to update everyone on where we're at with regards to third party testing in Neutron. The etherpad here [1] was the original link with status. The link here [2] shows what is expected of Neutron third party CI systems. On the CI status side, I'd like to ask the owners of the following CI systems to attend Monday's third party meeting [3] to discuss the status of their CI systems. These are the ones which appear to be in trouble, aren't running, or have some issues. 1. Cisco 1. Not enough logs being saved. 2. Log retention issues. 1. Citrix Netscaler LBaaS driver 1. I don't think this has a third party CI system running. 1. Embrane (both plugin and LBaaS driver) 1. Logs are tarred up and not viewable in web browser. 2. Inconsistent runs at times. 1. IBM SDN-VE 1. Currently inactive, moving to a new system. 1. One Convergence 1. Very high failure rate for patch runs. 1. OpenDaylight 1. Logs are tarred up and not viewable in web browser 1. PLUMgrid 1. Not saving enough logs 1. Radware 1. Logs are not viewable in browser 1. Tail-F 1. Inconsistent past runs, need updates on status. 1. vArmour FWaaS driver 1. Can't view logs. 2. Inconsistent runs against patches. I'd like to take some time in the Monday meeting to go over the issues these CI systems are having and give the maintainers a chance to discuss this with us. The third party team is hopeful we can spend the energy in the meeting working with CI maintainers who are actively interested in making progress on improving their CI systems. Per my email to the list in June [4], the expectation is that third party CI systems in Neutron are running and following the guidelines set forth by both Neutron and Infra. The weekly meeting is a place to seek help, and we're happy that a large number of third party CI owners and maintainers are using this resource. I'd also like to encourange anyone with a patch for a plugin or driver in Neutron to participate in the third-party meetings going forward as well. This will help to ensure your CI system is running while your patch is being reviewed, and you actively work to sort out issues during the review process to ensure smooth merging of your plugin or driver. Thank you! Kyle [1] https://etherpad.openstack.org/p/ZLp9Ow3tNq [2] https://wiki.openstack.org/wiki/NeutronThirdPartyTesting [3] https://wiki.openstack.org/wiki/Meetings/ThirdParty [4] http://lists.openstack.org/pipermail/openstack-dev/2014-June/037665.html ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client
Before we get too far ahead of ourselves mysql-connector is not hosted on pypi. Instead it is an external package link. We recently managed to remove all packages that are hosted as external package links from openstack and will not add new ones in. Before we can use mysql-connector in the gate oracle will need to publish mysql-connector on pypi properly. That said there is at least one other pure python alternative, PyMySQL. PyMySQL supports py3k and pypy. We should look at using PyMySQL instead if we want to start with a reasonable path to getting this in the gate. Clark On Fri, Jul 11, 2014 at 10:07 AM, Miguel Angel Ajo Pelayo mangel...@redhat.com wrote: +1 here too, Amazed with the performance gains, x2.4 seems a lot, and we'd get rid of deadlocks. - Original Message - +1 I'm pretty excited about the possibilities here. I've had this mysqldb/eventlet contention in the back of my mind for some time now. I'm glad to see some work being done in this area. Carl On Fri, Jul 11, 2014 at 7:04 AM, Ihar Hrachyshka ihrac...@redhat.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 09/07/14 13:17, Ihar Hrachyshka wrote: Hi all, Multiple projects are suffering from db lock timeouts due to deadlocks deep in mysqldb library that we use to interact with mysql servers. In essence, the problem is due to missing eventlet support in mysqldb module, meaning when a db lock is encountered, the library does not yield to the next green thread, allowing other threads to eventually unlock the grabbed lock, and instead it just blocks the main thread, that eventually raises timeout exception (OperationalError). The failed operation is not retried, leaving failing request not served. In Nova, there is a special retry mechanism for deadlocks, though I think it's more a hack than a proper fix. Neutron is one of the projects that suffer from those timeout errors a lot. Partly it's due to lack of discipline in how we do nested calls in l3_db and ml2_plugin code, but that's not something to change in foreseeable future, so we need to find another solution that is applicable for Juno. Ideally, the solution should be applicable for Icehouse too to allow distributors to resolve existing deadlocks without waiting for Juno. We've had several discussions and attempts to introduce a solution to the problem. Thanks to oslo.db guys, we now have more or less clear view on the cause of the failures and how to easily fix them. The solution is to switch mysqldb to something eventlet aware. The best candidate is probably MySQL Connector module that is an official MySQL client for Python and that shows some (preliminary) good results in terms of performance. I've made additional testing, creating 2000 networks in parallel (10 thread workers) for both drivers and comparing results. With mysqldb: 215.81 sec With mysql-connector: 88.66 ~2.4 times performance boost, ok? ;) I think we should switch to that library *even* if we forget about all the nasty deadlocks we experience now. I've posted a Neutron spec for the switch to the new client in Juno at [1]. Ideally, switch is just a matter of several fixes to oslo.db that would enable full support for the new driver already supported by SQLAlchemy, plus 'connection' string modified in service configuration files, plus documentation updates to refer to the new official way to configure services for MySQL. The database code won't, ideally, require any major changes, though some adaptation for the new client library may be needed. That said, Neutron does not seem to require any changes, though it was revealed that there are some alembic migration rules in Keystone or Glance that need (trivial) modifications. You can see how trivial the switch can be achieved for a service based on example for Neutron [2]. While this is a Neutron specific proposal, there is an obvious wish to switch to the new library globally throughout all the projects, to reduce devops burden, among other things. My vision is that, ideally, we switch all projects to the new library in Juno, though we still may leave several projects for K in case any issues arise, similar to the way projects switched to oslo.messaging during two cycles instead of one. Though looking at how easy Neutron can be switched to the new library, I wouldn't expect any issues that would postpone the switch till K. It was mentioned in comments to the spec proposal that there were some discussions at the latest summit around possible switch in context of Nova that revealed some concerns, though they do not seem to be documented anywhere. So if you know anything about it, please comment. So, we'd like to hear from other projects what's your take on that move, whether you see any issues or have concerns about it. Thanks for your comments, /Ihar [1]:
Re: [openstack-dev] [nova] request to tag novaclient 2.18.0
On 07/11/2014 05:29 AM, Thierry Carrez wrote: Matthias Runge wrote: On 11/07/14 02:04, Michael Still wrote: Sorry for the delay here. This email got lost in my inbox while I was travelling. This release is now tagged. Additionally, I have created a milestone for this release in launchpad, which is the keystone process for client releases. This means that users of launchpad can now see what release a given bug was fixed in, and improves our general launchpad bug hygiene. However, because we haven't done this before, this first release is a bit bigger than it should me. I'm having some pain marking the milestone as released in launchpad, but I am arguing with launchpad about that now. Michael Cough, this broke horizon stable and master; heat stable is affected as well. For Horizon, I filed bug https://bugs.launchpad.net/horizon/+bug/1340596 The same bug (https://bugs.launchpad.net/bugs/1340596) will be used to track Heat tasks as well. Thanks for pointing this out. These non-backwards compatible changes should not have been merged, IMO. They really should have waited until a v2.0, or at least done in a backwards copmatible way. I'll look into what reverts are needed. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Glance] Anyone using owner_is_tenant = False with image members?
Hi Alexander, I read through the artifact spec. Based on my reading it does not fix this issue at all. [1] Furthermore, I do not understand why the glance developers are focused on adding features like artifacts or signed images when there are significant usability problems with glance as it currently stands. This is echoing Sean Dague's comment that bugs are filed against glance but never addressed. [1] See the **Sharing Artifact** section, which indicates that sharing may only be done between projects and that the tenant owns the image. On Thu, Jul 3, 2014 at 4:55 AM, Alexander Tivelkov ativel...@mirantis.com wrote: Thanks Scott, that is a nice topic In theory, I would prefer to have both owner_tenant and owner_user to be persisted with an image, and to have a policy rule which allows to specify if the users of a tenant have access to images owned by or shared with other users of their tenant. But this will require too much changes to the current object model, and I am not sure if we need to introduce such changes now. However, this is the approach I would like to use in Artifacts. At least the current version of the spec assumes that both these fields to be maintained ([0]) [0] https://review.openstack.org/#/c/100968/4/specs/juno/artifact-repository.rst -- Regards, Alexander Tivelkov On Thu, Jul 3, 2014 at 3:44 AM, Scott Devoid dev...@anl.gov wrote: Hi folks, Background: Among all services, I think glance is unique in only having a single 'owner' field for each image. Most other services include a 'user_id' and a 'tenant_id' for things that are scoped this way. Glance provides a way to change this behavior by setting owner_is_tenant to false, which implies that owner is user_id. This works great: new images are owned by the user that created them. Why do we want this? We would like to make sure that the only person who can delete an image (besides admins) is the person who uploaded said image. This achieves that goal nicely. Images are private to the user, who may share them with other users using the image-member API. However, one problem is that we'd like to allow users to share with entire projects / tenants. Additionally, we have a number of images (~400) migrated over from a different OpenStack deployment, that are owned by the tenant and we would like to make sure that users in that tenant can see those images. Solution? I've implemented a small patch to the is_image_visible API call [1] which checks the image.owner and image.members against context.owner and context.tenant. This appears to work well, at least in my testing. I am wondering if this is something folks would like to see integrated? Also for glance developers, if there is a cleaner way to go about solving this problem? [2] ~ Scott [1] https://github.com/openstack/glance/blob/master/glance/db/sqlalchemy/api.py#L209 [2] https://review.openstack.org/104377 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] request to tag novaclient 2.18.0
On 07/11/2014 01:27 PM, Russell Bryant wrote: On 07/11/2014 05:29 AM, Thierry Carrez wrote: Matthias Runge wrote: On 11/07/14 02:04, Michael Still wrote: Sorry for the delay here. This email got lost in my inbox while I was travelling. This release is now tagged. Additionally, I have created a milestone for this release in launchpad, which is the keystone process for client releases. This means that users of launchpad can now see what release a given bug was fixed in, and improves our general launchpad bug hygiene. However, because we haven't done this before, this first release is a bit bigger than it should me. I'm having some pain marking the milestone as released in launchpad, but I am arguing with launchpad about that now. Michael Cough, this broke horizon stable and master; heat stable is affected as well. For Horizon, I filed bug https://bugs.launchpad.net/horizon/+bug/1340596 The same bug (https://bugs.launchpad.net/bugs/1340596) will be used to track Heat tasks as well. Thanks for pointing this out. These non-backwards compatible changes should not have been merged, IMO. They really should have waited until a v2.0, or at least done in a backwards copmatible way. I'll look into what reverts are needed. I posted a couple of reverts that I think will resolve these problems: https://review.openstack.org/#/c/106446/ https://review.openstack.org/#/c/106447/ -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] [third-party] Update on third party CI in Neutron
Thanks Kyle for the update. As noted below, we have been in the process of upgrading the SDN-VE CI system to a Zuul based system and this has caused an interruption in our voting. We have resolved the issues we were facing with standing up our new system and are close to having our system voting again. Daya (dkam...@us.ibm.com)who owns our new CI system (and myself) will be participating in the Third Party Testing meetings on Monday and onward.Best,Mohammad-Kyle Mestery mest...@noironetworks.com wrote: -To: "OpenStack Development Mailing List (not for usage questions)" openstack-dev@lists.openstack.orgFrom: Kyle Mestery mest...@noironetworks.comDate: 07/11/2014 11:58AMSubject: [openstack-dev] [neutron] [third-party] Update on third party CI in NeutronSince Juno-2 is quickly approaching, I wanted to update everyone on wherewe're at with regards to third party testing in Neutron. The etherpad here [1]was the original link with status. The link here [2] shows what is expected of Neutron third party CI systems.On the CI status side, I'd like to ask the owners of the following CI systemsto attend Monday's third party meeting [3] to discuss the status of their CIsystems. These are the ones which appear to be in trouble, aren't running, or have some issues.CiscoNot enough logs being saved.Log retention issues.Citrix Netscaler LBaaS driverI don't think this has a third party CI system running. Embrane (both plugin and LBaaS driver)Logs are tarred up and not viewable in web browser.Inconsistent runs at times.IBM SDN-VECurrently inactive, moving to a new system. One ConvergenceVery high failure rate for patch runs.OpenDaylightLogs are tarred up and not viewable in web browserPLUMgridNot saving enough logs RadwareLogs are not viewable in browserTail-FInconsistent past runs, need updates on status.vArmour FWaaS driverCan't view logs.Inconsistent runs against patches. I'd like to take some time in the Monday meeting to go over the issues theseCI systems are having and give the maintainers a chance to discuss this withus. The third party team is hopeful we can spend the energy in the meeting working with CI maintainers who are actively interested in making progresson improving their CI systems.Per my email to the list in June [4], the expectation is that third party CIsystems in Neutron are running and following the guidelines set forth by both Neutron and Infra. The weekly meeting is a place to seek help, andwe're happy that a large number of third party CI owners and maintainersare using this resource.I'd also like to encourange anyone with a patch for a plugin or driver in Neutron to participate in the third-party meetings going forward as well. This will helpto ensure your CI system is running while your patch is being reviewed, and youactively work to sort out issues during the review process to ensure smooth merging of your plugin or driver.Thank you!Kyle[1] https://etherpad.openstack.org/p/ZLp9Ow3tNq[2] https://wiki.openstack.org/wiki/NeutronThirdPartyTesting [3] https://wiki.openstack.org/wiki/Meetings/ThirdParty[4] http://lists.openstack.org/pipermail/openstack-dev/2014-June/037665.html ___OpenStack-dev mailing listOpenStack-dev@lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] [third-party] Update on third party CI in Neutron
On 11 July 2014 17:56, Kyle Mestery mest...@noironetworks.com wrote: 1. Tail-F 1. Inconsistent past runs, need updates on status. I've updated the Etherpad for our Tail-f CI and will be at the meeting. Cheers, -Luke ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Neutron permission issue
Hi The issue seems to be the following default config in Neutron policy create_router:external_gateway_info:enable_snat: rule:admin_only, update_router:external_gateway_info:enable_snat: rule:admin_only, Puzzling part is from horizon when I set an external gateway for a router is it not the same thing as above. How does it allow it from horizon than? Ajay From: Ian Wells (iawells) iawe...@cisco.commailto:iawe...@cisco.com Date: Friday, July 11, 2014 at 10:56 AM To: akalambu akala...@cisco.commailto:akala...@cisco.com, OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Cc: openstack-systems-group(mailer list) openstack-systems-gr...@cisco.commailto:openstack-systems-gr...@cisco.com Subject: Re: Neutron permission issue Check /etc/neutron/policy.json, but I agree that's weird... -- Ian. From: Ajay Kalambur (akalambu) akala...@cisco.commailto:akala...@cisco.com Date: Friday, 11 July 2014 10:05 To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Cc: openstack-systems-group(mailer list) openstack-systems-gr...@cisco.commailto:openstack-systems-gr...@cisco.com Subject: Neutron permission issue Hi As a tenant when I try to create a router and associate a gateway with the router as a two step process in Horizon things work fine. Now when I want to do the same thing through a create router API call with request below I get permission denied to create router { router: { name: another_router, admin_state_up: true, external_gateway_info: { network_id: 3c5bcddd-6af9-4e6b-9c3e-c153e521cab8, enable_snat: false} } } The network id in both cases is the same. This does not make sense to me Traceback (most recent call last): File vm-tp.py, line 54, in setUp ext_router = self.net.create_router(CONF.ROUTER_NAME, ext_net['id']) File /Users/akalambu/python_venv/latest_code/pns/network.py, line 121, in create_router router = self.neutron_client.create_router(body) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 101, in with_params ret = self.function(instance, *args, **kwargs) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 398, in create_router return self.post(self.routers_path, body=body) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 1320, in post headers=headers, params=params) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 1243, in do_request self._handle_fault_response(status_code, replybody) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 1211, in _handle_fault_response exception_handler_v20(status_code, des_error_body) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 68, in exception_handler_v20 status_code=status_code) Forbidden: Policy doesn't allow create_router to be performed. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Neutron permission issue
Never mind figured it out the rule is on enable_snat inside external gateway info that was the issue But I think there is an issue with update because the message is misleading when I try to update with external gateway info and enable_snat. I get a message that Resource not found when in reality its a permission issue I got this exception on update router /v2_0/client.py, line 1212, in _handle_fault_response exception_handler_v20(status_code, des_error_body) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 74, in exception_handler_v20 message=error_dict) NeutronClientException: The resource could not be found. When I had following body = { router: { name : pns-router, external_gateway_info: { network_id: net_id, enable_snat : False } } } It should have thrown a policy error and not this From: akalambu akala...@cisco.commailto:akala...@cisco.com Date: Friday, July 11, 2014 at 11:09 AM To: Ian Wells (iawells) iawe...@cisco.commailto:iawe...@cisco.com, OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Subject: Re: Neutron permission issue Hi The issue seems to be the following default config in Neutron policy create_router:external_gateway_info:enable_snat: rule:admin_only, update_router:external_gateway_info:enable_snat: rule:admin_only, Puzzling part is from horizon when I set an external gateway for a router is it not the same thing as above. How does it allow it from horizon than? Ajay From: Ian Wells (iawells) iawe...@cisco.commailto:iawe...@cisco.com Date: Friday, July 11, 2014 at 10:56 AM To: akalambu akala...@cisco.commailto:akala...@cisco.com, OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Cc: openstack-systems-group(mailer list) openstack-systems-gr...@cisco.commailto:openstack-systems-gr...@cisco.com Subject: Re: Neutron permission issue Check /etc/neutron/policy.json, but I agree that's weird... -- Ian. From: Ajay Kalambur (akalambu) akala...@cisco.commailto:akala...@cisco.com Date: Friday, 11 July 2014 10:05 To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Cc: openstack-systems-group(mailer list) openstack-systems-gr...@cisco.commailto:openstack-systems-gr...@cisco.com Subject: Neutron permission issue Hi As a tenant when I try to create a router and associate a gateway with the router as a two step process in Horizon things work fine. Now when I want to do the same thing through a create router API call with request below I get permission denied to create router { router: { name: another_router, admin_state_up: true, external_gateway_info: { network_id: 3c5bcddd-6af9-4e6b-9c3e-c153e521cab8, enable_snat: false} } } The network id in both cases is the same. This does not make sense to me Traceback (most recent call last): File vm-tp.py, line 54, in setUp ext_router = self.net.create_router(CONF.ROUTER_NAME, ext_net['id']) File /Users/akalambu/python_venv/latest_code/pns/network.py, line 121, in create_router router = self.neutron_client.create_router(body) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 101, in with_params ret = self.function(instance, *args, **kwargs) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 398, in create_router return self.post(self.routers_path, body=body) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 1320, in post headers=headers, params=params) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 1243, in do_request self._handle_fault_response(status_code, replybody) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 1211, in _handle_fault_response exception_handler_v20(status_code, des_error_body) File /Users/akalambu/python_venv/venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py, line 68, in exception_handler_v20 status_code=status_code) Forbidden: Policy doesn't allow create_router to be performed. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Asyncio and oslo.messaging
S, how about we can continue this in #openstack-state-management (or #openstack-oslo). Since I think we've all made the point and different viewpoints visible (which was the main intention). Overall, I'd like to see asyncio more directly connected into taskflow so we can have the best of both worlds. We just have to be careful in letting people blow their feet off, vs. being to safe; but that discussion I think we can have outside this thread. Sound good? -Josh On Jul 11, 2014, at 9:04 AM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Yuriy Taraday's message of 2014-07-11 03:08:14 -0700: On Thu, Jul 10, 2014 at 11:51 PM, Josh Harlow harlo...@outlook.com wrote: 2. Introspection, I hope this one is more obvious. When the coroutine call-graph is the workflow there is no easy way to examine it before it executes (and change parts of it for example before it executes). This is a nice feature imho when it's declaratively and explicitly defined, you get the ability to do this. This part is key to handling upgrades that typically happen (for example the a workflow with the 5th task was upgraded to a newer version, we need to stop the service, shut it off, do the code upgrade, restart the service and change 5th task from v1 to v1.1). I don't really understand why would one want to examine or change workflow before running. Shouldn't workflow provide just enough info about which tasks should be run in what order? In case with coroutines when you do your upgrade and rerun workflow, it'll just skip all steps that has already been run and run your new version of 5th task. I'm kind of with you on this one. Changing the workflow feels like self modifying code. 3. Dataflow: tasks in taskflow can not just declare workflow dependencies but also dataflow dependencies (this is how tasks transfer things from one to another). I suppose the dataflow dependency would mirror to coroutine variables arguments (except the variables/arguments would need to be persisted somewhere so that it can be passed back in on failure of the service running that coroutine). How is that possible without an abstraction over those variables/arguments (a coroutine can't store these things in local variables since those will be lost)?It would seem like this would need to recreate the persistence storage layer[5] that taskflow already uses for this purpose to accomplish this. You don't need to persist local variables. You just need to persist results of all tasks (and you have to do it if you want to support workflow interruption and restart). All dataflow dependencies are declared in the coroutine in plain Python which is what developers are used to. That is actually the problem that using declarative systems avoids. @asyncio.couroutine def add_ports(ctx, server_def): port, volume = yield from asyncio.gather(ctx.run_task(create_port(server_def)), ctx.run_task(create_volume(server_def)) if server_def.wants_drbd: setup_drbd(volume, server_def) yield from ctx.run_task(boot_server(volume_az, server_def)) Now we have a side effect which is not in a task. If booting fails, and we want to revert, we won't revert the drbd. This is easy to miss because we're just using plain old python, and heck it already even has a test case. I see this type of thing a lot.. we're not arguing about capabilities, but about psychological differences. There are pros and cons to both approaches. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Live migration objects
I'm contemplating how to fix https://bugs.launchpad.net/nova/+bug/1339823 and it seems that a part of the fix would be to track the state of live migrations in the database, more or less the same way that cold migrations are tracked. The thinking is that the logic could retrieve information about the live migration (particularly its state) and act accordingly, again similar to how incomplete cold migrations are handled during host initialization. I have been looking through the relevant code history and I can't find any information about why live migrations are not tracked in the database while cold migrations are. In any case, before I start writing a bunch of code, I was wondering whether others agree that tracking live migrations in the database seems like a reasonable approach and if so, whether existing Migration objects should be used for this purpose or if a new type (e.g. LiveMigration) should be introduced instead. I'm thinking the former approach would entail adding a flag to the existing Migration type to indicate the migration type (cold vs. live); although arguably less invasive, using this approach might break existing functionality that retrieves migration information. Any guidance would be appreciated. Thanks, John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [cinder][replication-api] extra_specs too constant
Philipp, Thanks for the feedback, below if my view, and I would like to hear what others think. I would typically expect the replication_partners to be created/computed by the driver from the underlaying replication mechanism. I assume DRBD know with whom he is current enabled for replication - I don't think this should be kept in the Cinder DB (in the extra specs of the volume-type). In the extra specs we may find replica_volume_backend_name, but I expect it to be a short list. As for the case of multiple appropriate replication targets, the current plan is to choose the 1st eligible , but we can change it to be a random entry from the list, if you think that is appropriate. Regarding actual replication_rpo_range and network bandwidth, I think the current suggestion is a reasonable 1st step. Multiple considerations will of course impact the actual RPO, but I think this is outside the scope of this 1st revision - I would like to see this mechanism enhanced in the next revision. Ronen, From: Philipp Marek philipp.ma...@linbit.com To: openstack-dev@lists.openstack.org, Cc: Ronen Kat/Haifa/IBM@IBMIL Date: 11/07/2014 04:10 PM Subject:[openstack-dev][cinder][replication-api] extra_specs too constant I think that extra_specs in the database is too static, too hard to change. In the case of eg. DRBD, where many nodes may provide some storage space, the list replication_partners is likely to change often, even if only newly added nodes have to be done[1] This means that a) the admin has to add each node manually b) volume_type_extra_specs:value is a VARCHAR(255), which can only provide a few host names. (With FQDN even more so.) What if the list of hosts would be matched by each one saying I'm product XYZ version compat N-M (eg. via get_volume_stats), and all nodes that report the same product with an overlapping version range are considered eligible for replication? Furthermore, replication_rpo_range might depend on other circumstances too... if the network connection to the second site is heavily loaded, the RPO will vary, too - from a few seconds to a few hours. So, should we announce a range of (0,7200)? Ad 1: because Openstack sees by itself which nodes are available. -- : Ing. Philipp Marek : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com : DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [neutron] [third-party] Update on third party CI in Neutron
Hi Kyle, There is indeed a NetScaler CI and is currently running API and scenario tests on LBAAS changes + driver changes. It also votes. What time is the Monday 3rd party meeting? Thanks, Vijay. Sent using CloudMagichttps://cloudmagic.com/k/d/mailapp?ct=pacv=1.0.21.3pv=4.2.2 On Fri, Jul 11, 2014 at 9:27 PM, Kyle Mestery mest...@noironetworks.commailto:mest...@noironetworks.com wrote: Since Juno-2 is quickly approaching, I wanted to update everyone on where we're at with regards to third party testing in Neutron. The etherpad here [1] was the original link with status. The link here [2] shows what is expected of Neutron third party CI systems. On the CI status side, I'd like to ask the owners of the following CI systems to attend Monday's third party meeting [3] to discuss the status of their CI systems. These are the ones which appear to be in trouble, aren't running, or have some issues. 1. Cisco * Not enough logs being saved. * Log retention issues. 2. Citrix Netscaler LBaaS driver * I don't think this has a third party CI system running. 3. Embrane (both plugin and LBaaS driver) * Logs are tarred up and not viewable in web browser. * Inconsistent runs at times. 4. IBM SDN-VE * Currently inactive, moving to a new system. 5. One Convergence * Very high failure rate for patch runs. 6. OpenDaylight * Logs are tarred up and not viewable in web browser 7. PLUMgrid * Not saving enough logs 8. Radware * Logs are not viewable in browser 9. Tail-F * Inconsistent past runs, need updates on status. 10. vArmour FWaaS driver * Can't view logs. * Inconsistent runs against patches. I'd like to take some time in the Monday meeting to go over the issues these CI systems are having and give the maintainers a chance to discuss this with us. The third party team is hopeful we can spend the energy in the meeting working with CI maintainers who are actively interested in making progress on improving their CI systems. Per my email to the list in June [4], the expectation is that third party CI systems in Neutron are running and following the guidelines set forth by both Neutron and Infra. The weekly meeting is a place to seek help, and we're happy that a large number of third party CI owners and maintainers are using this resource. I'd also like to encourange anyone with a patch for a plugin or driver in Neutron to participate in the third-party meetings going forward as well. This will help to ensure your CI system is running while your patch is being reviewed, and you actively work to sort out issues during the review process to ensure smooth merging of your plugin or driver. Thank you! Kyle [1] https://etherpad.openstack.org/p/ZLp9Ow3tNq [2] https://wiki.openstack.org/wiki/NeutronThirdPartyTesting [3] https://wiki.openstack.org/wiki/Meetings/ThirdParty [4] http://lists.openstack.org/pipermail/openstack-dev/2014-June/037665.html ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [congress] mid-cycle policy summit
I need feedback from the congress team on which two days works for you. 11-12 September 18-19 September ~sean On Jul 10, 2014, at 5:56 PM, Sean Roberts seanrobert...@gmail.com wrote: I'm thinking location as yahoo Sunnyvale or VMware Palo Alto. ~sean On Jul 10, 2014, at 5:12 PM, sean roberts seanrobert...@gmail.com wrote: The Congress team would like to get us policy people together to discuss how each project is approaching policy and our common future prior to the Paris summit. More details about the Congress can be found here https://wiki.openstack.org/wiki/Congress. I have discussed the idea with mestery and mikal, but I wanted to include as many other projects as possible. I propose this agenda first day each project talks about their policy approach second day whiteboarding and discussion about integrating our policy approaches I propose a few dates 11-12 September 18-19 September ~Sean Roberts ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Fuel] Feature Freeze for 5.1
Hi fuelers, it is time for Feature Freeze in 5.1. There are following exceptions we agreed on at the IRC meeting yesterday [1], which need a bit more time (and actually are almost ready): 1. Mellanox support [2]. Extended FF date is July, 17th 2. Neutron NSX plugin [3] - July, 15th 3. Replacement of current ML2 implementation by [4] after it passes extensive testing by QA team, July 16th 4. Galera improvements, which improve HA [5] - July, 14th [1] http://eavesdrop.openstack.org/meetings/fuel/2014/fuel.2014-07-10-16.00.log.html [2] https://blueprints.launchpad.net/fuel/+spec/mellanox-features-support [3] https://blueprints.launchpad.net/fuel/+spec/neutron-nsx-plugin-integration [4] https://review.openstack.org/#/c/103280/ [5] https://blueprints.launchpad.net/fuel/+spec/galera-improvements Let me quickly know if I miss anything. -- Mike Scherbakov #mihgen ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Live migration objects
I'm contemplating how to fix https://bugs.launchpad.net/nova/+bug/1339823 and it seems that a part of the fix would be to track the state of live migrations in the database, more or less the same way that cold migrations are tracked. The thinking is that the logic could retrieve information about the live migration (particularly its state) and act accordingly, again similar to how incomplete cold migrations are handled during host initialization. I have been looking through the relevant code history and I can't find any information about why live migrations are not tracked in the database while cold migrations are. In any case, before I start writing a bunch of code, I was wondering whether others agree that tracking live migrations in the database seems like a reasonable approach and if so, whether existing Migration objects should be used for this purpose or if a new type (e.g. LiveMigration) should be introduced instead. There has been some effort and plans in the past to unify more than just the state tracking of live and cold migration. That effort is what needs to be done eventually. However, for your immediate bug, I say just have the compute host abandon the instance if the database says its host != self.host, and otherwise maybe just return it to a running state. I think it's fairly safe to say that if a live migration fails in the middle, there is little chance that it is running in a meaningful state on the source or destination host. As long as init_host() chooses a consistent way to determine ownership, that's *probably* the best we can do. --Dan signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] request to tag novaclient 2.18.0
I can do another release once https://review.openstack.org/#/c/106447/ merges. Michael On Sat, Jul 12, 2014 at 3:51 AM, Russell Bryant rbry...@redhat.com wrote: On 07/11/2014 01:27 PM, Russell Bryant wrote: On 07/11/2014 05:29 AM, Thierry Carrez wrote: Matthias Runge wrote: On 11/07/14 02:04, Michael Still wrote: Sorry for the delay here. This email got lost in my inbox while I was travelling. This release is now tagged. Additionally, I have created a milestone for this release in launchpad, which is the keystone process for client releases. This means that users of launchpad can now see what release a given bug was fixed in, and improves our general launchpad bug hygiene. However, because we haven't done this before, this first release is a bit bigger than it should me. I'm having some pain marking the milestone as released in launchpad, but I am arguing with launchpad about that now. Michael Cough, this broke horizon stable and master; heat stable is affected as well. For Horizon, I filed bug https://bugs.launchpad.net/horizon/+bug/1340596 The same bug (https://bugs.launchpad.net/bugs/1340596) will be used to track Heat tasks as well. Thanks for pointing this out. These non-backwards compatible changes should not have been merged, IMO. They really should have waited until a v2.0, or at least done in a backwards copmatible way. I'll look into what reverts are needed. I posted a couple of reverts that I think will resolve these problems: https://review.openstack.org/#/c/106446/ https://review.openstack.org/#/c/106447/ -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] request to tag novaclient 2.18.0
On Fri, Jul 11, 2014 at 4:42 AM, Jeremy Stanley fu...@yuggoth.org wrote: On 2014-07-11 11:21:19 +0200 (+0200), Matthias Runge wrote: this broke horizon stable and master; heat stable is affected as well. [...] I guess this is a plea for applying something like the oslotest framework to client libraries so they get backward-compat jobs run against unit tests of all dependant/consuming software... branchless tempest already alleviates some of this, but not the case of changes in a library which will break unit/functional tests of another project. We actually do have some tests for backwards compatibility, and they all passed. Presumably because both heat and horizon have poor integration test. We ran - check-tempest-dsvm-full-havana http://logs.openstack.org/66/94166/3/check/check-tempest-dsvm-full-havana/8e09faa SUCCESS in 40m 47s (non-voting) - check-tempest-dsvm-neutron-havana http://logs.openstack.org/66/94166/3/check/check-tempest-dsvm-neutron-havana/b4ad019 SUCCESS in 36m 17s (non-voting) - check-tempest-dsvm-full-icehouse http://logs.openstack.org/66/94166/3/check/check-tempest-dsvm-full-icehouse/c0c62e5 SUCCESS in 53m 05s - check-tempest-dsvm-neutron-icehouse http://logs.openstack.org/66/94166/3/check/check-tempest-dsvm-neutron-icehouse/a54aedb SUCCESS in 57m 28s on the offending patches (https://review.openstack.org/#/c/94166/) Infra patch that added these tests: https://review.openstack.org/#/c/80698/ -- Jeremy Stanley ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Containers][Nova] Containers Team Mid-Cyle Meetup to join Nova Meetup
Containers Team, We have decided to hold our Mid-Cycle meetup along with the Nova Meetup in Beaverton, Oregon on Aug 28-31.The Nova Meetup is scheduled for Aug 28-30. https://www.eventbrite.com.au/e/openstack-nova-juno-mid-cycle-developer-meetup-tickets-11878128803 Those of us interested in Containers topic will use one of the breakout rooms generously offered by Intel. We will also stay on Thursday to focus on implementation plans and to engage with those members of the Nova Team who will be otherwise occupied on Aug 28-30, and will have a chance to focus entirely on Containers on the 31st. Please take a moment now to register using the link above, and I look forward to seeing you there. Thanks, Adrian Otto ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Fuel] [OSTF] OSTF stops working after password is changed
I'm wondering if we can show all these windows ONLY if there is authz failure with existing credentials from Nailgun. So the flow would be: user clicks on Run tests button, healthcheck tries to access OpenStack and fails. It shows up text fields to enter tenant/user/pass with the message similar to Default administrative credentials to OpenStack were changed since the deployment time. Please provide current credentials so HealthCheck can access OpenStack and run verification tests. I think it should be more obvious this way... Anyone, it must be a choice for a user, if he wants to store creds in a browser. On Fri, Jul 11, 2014 at 8:50 PM, Vitaly Kramskikh vkramsk...@mirantis.com wrote: Hi, In the current implementation we store provided credentials in browser local storage. What's your opinion on that? Maybe we shouldn't store new credentials at all even in browser? So users have to enter them manually every time they want to run OSTF. 2014-06-25 13:47 GMT+04:00 Dmitriy Shulyak dshul...@mirantis.com: It is possible to change everything so username, password and tenant fields Also this way we will be able to run tests not only as admin user On Wed, Jun 25, 2014 at 12:29 PM, Vitaly Kramskikh vkramsk...@mirantis.com wrote: Dmitry, Fields or field? Do we need to provide password only or other credentials are needed? 2014-06-25 13:02 GMT+04:00 Dmitriy Shulyak dshul...@mirantis.com: Looks like we will stick to #2 option, as most reliable one. - we have no way to know that openrc is changed, even if some scripts relies on it - ostf should not fail with auth error - we can create ostf user in post-deployment stage, but i heard that some ceilometer tests relied on admin user, also operator may not want to create additional user, for some reasons So, everybody is ok with additional fields on HealthCheck tab? On Fri, Jun 20, 2014 at 8:17 PM, Andrew Woodward xar...@gmail.com wrote: The openrc file has to be up to date for some of the HA scripts to work, we could just source that. On Fri, Jun 20, 2014 at 12:12 AM, Sergii Golovatiuk sgolovat...@mirantis.com wrote: +1 for #2. ~Sergii On Fri, Jun 20, 2014 at 1:21 AM, Andrey Danin ada...@mirantis.com wrote: +1 to Mike. Let the user provide actual credentials and use them in place. On Fri, Jun 20, 2014 at 2:01 AM, Mike Scherbakov mscherba...@mirantis.com wrote: I'm in favor of #2. I think users might not want to have their password stored in Fuel Master node. And if so, then it actually means we should not save it when user provides it on HealthCheck tab. On Thu, Jun 19, 2014 at 8:05 PM, Vitaly Kramskikh vkramsk...@mirantis.com wrote: Hi folks, We have a bug which prevents OSTF from working if user changes a password which was using for the initial installation. I skimmed through the comments and it seems there are 2 viable options: Create a separate user just for OSTF during OpenStack installation Provide a field for a password in UI so user could provide actual password in case it was changed What do you guys think? Which options is better? -- Vitaly Kramskikh, Software Engineer, Mirantis, Inc. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Mike Scherbakov #mihgen ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Andrey Danin ada...@mirantis.com skype: gcon.monolake ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Andrew Mirantis Ceph community ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Vitaly Kramskikh, Software Engineer, Mirantis, Inc. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Vitaly Kramskikh, Software Engineer, Mirantis, Inc.
[openstack-dev] [nova][neutron] Networks without subnets
Hi, A bug titled Creating quantum L2 networks (without subnets) doesn't work as expected (https://bugs.launchpad.net/nova/+bug/1039665) was reported quite some time ago. Beyond the discussion in the bug report, there have been related bugs reported a few times. * https://bugs.launchpad.net/nova/+bug/1304409 * https://bugs.launchpad.net/nova/+bug/1252410 * https://bugs.launchpad.net/nova/+bug/1237711 * https://bugs.launchpad.net/nova/+bug/1311731 * https://bugs.launchpad.net/nova/+bug/1043827 BZs on this subject seem to have a hard time surviving. The get marked as incomplete or invalid, or in the related issues, the problem NOT related to the feature is addressed and the bug closed. We seem to dance around actually getting around to implementing this. The multiple reports show there *is* interest in this functionality but at the moment we are without an actual implementation. At the moment there are multiple related blueprints: * https://review.openstack.org/#/c/99873/ ML2 OVS: portsecurity extension support * https://review.openstack.org/#/c/106222/ Add Port Security Implementation in ML2 Plugin * https://review.openstack.org/#/c/97715 NFV unaddressed interfaces The first two blueprints, besides appearing to be very similar, propose implementing the port security extension currently employed by one of the neutron plugins. It is related to this issue as it allows a port to be configured indicating it does not want security groups to apply. This is relevant because without an address, a security group cannot be applied and this is treated as an error. Being able to specify skipping the security group criteria gets us a port on the network without an address, which is what happens when there is no subnet. The third approach is, on the face of it, related in that it proposes an interface without an address. However, on review it seems that the intent is not necessarily inline with the some of the BZs mentioned above. Indeed there is text that seems to pretty clearly state that it is not intended to cover the port-without-an-IP situation. As an aside, the title in the commit message in the review could use revising. In order to implement something that finally implements the functionality alluded to in the above BZs in Juno, we need to settle on a blueprint and direction. Barring the happy possiblity of a resolution beforehand, can this be made an agenda item in the next Nova and/or Neutron meetings? Cheers, Brent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [congress] mid-cycle policy summit
Thanks for initiating this discussion. We would be happy to participate and host this at the Cisco office as well if need be. ~Sumit. On Fri, Jul 11, 2014 at 12:32 PM, Sean Roberts seanrobert...@gmail.com wrote: I need feedback from the congress team on which two days works for you. 11-12 September 18-19 September ~sean On Jul 10, 2014, at 5:56 PM, Sean Roberts seanrobert...@gmail.com wrote: I'm thinking location as yahoo Sunnyvale or VMware Palo Alto. ~sean On Jul 10, 2014, at 5:12 PM, sean roberts seanrobert...@gmail.com wrote: The Congress team would like to get us policy people together to discuss how each project is approaching policy and our common future prior to the Paris summit. More details about the Congress can be found here https://wiki.openstack.org/wiki/Congress. I have discussed the idea with mestery and mikal, but I wanted to include as many other projects as possible. I propose this agenda first day each project talks about their policy approach second day whiteboarding and discussion about integrating our policy approaches I propose a few dates 11-12 September 18-19 September ~Sean Roberts ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] mid-cycle policy summit
Hi Sean, Yes - please make sure it is going to be in the Bay Area this time :-) Thanks, - Stephen On Thu, Jul 10, 2014 at 5:56 PM, Sean Roberts seanrobert...@gmail.com wrote: I'm thinking location as yahoo Sunnyvale or VMware Palo Alto. ~sean On Jul 10, 2014, at 5:12 PM, sean roberts seanrobert...@gmail.com wrote: The Congress team would like to get us policy people together to discuss how each project is approaching policy and our common future prior to the Paris summit. More details about the Congress can be found here https://wiki.openstack.org/wiki/Congress. I have discussed the idea with mestery and mikal, but I wanted to include as many other projects as possible. I propose this agenda 1. first day each project talks about their policy approach 2. second day whiteboarding and discussion about integrating our policy approaches I propose a few dates - 11-12 September - 18-19 September ~Sean Roberts ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client
I have tried using pymysql in place of mysqldb and in real world concurrency tests against cinder and nova it performs slower. I was inspired by the mention of mysql-connector so I just tried that option instead. Mysql-connector seems to be slightly slower as well, which leads me to believe that the blocking inside of sqlalchemy is not the main bottleneck across projects. Vish P.S. The performanace in all cases was abysmal, so performance work definitely needs to be done, but just the guess that replacing our mysql library is going to solve all of our performance problems appears to be incorrect at first blush. On Jul 11, 2014, at 10:20 AM, Clark Boylan clark.boy...@gmail.com wrote: Before we get too far ahead of ourselves mysql-connector is not hosted on pypi. Instead it is an external package link. We recently managed to remove all packages that are hosted as external package links from openstack and will not add new ones in. Before we can use mysql-connector in the gate oracle will need to publish mysql-connector on pypi properly. That said there is at least one other pure python alternative, PyMySQL. PyMySQL supports py3k and pypy. We should look at using PyMySQL instead if we want to start with a reasonable path to getting this in the gate. Clark On Fri, Jul 11, 2014 at 10:07 AM, Miguel Angel Ajo Pelayo mangel...@redhat.com wrote: +1 here too, Amazed with the performance gains, x2.4 seems a lot, and we'd get rid of deadlocks. - Original Message - +1 I'm pretty excited about the possibilities here. I've had this mysqldb/eventlet contention in the back of my mind for some time now. I'm glad to see some work being done in this area. Carl On Fri, Jul 11, 2014 at 7:04 AM, Ihar Hrachyshka ihrac...@redhat.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 09/07/14 13:17, Ihar Hrachyshka wrote: Hi all, Multiple projects are suffering from db lock timeouts due to deadlocks deep in mysqldb library that we use to interact with mysql servers. In essence, the problem is due to missing eventlet support in mysqldb module, meaning when a db lock is encountered, the library does not yield to the next green thread, allowing other threads to eventually unlock the grabbed lock, and instead it just blocks the main thread, that eventually raises timeout exception (OperationalError). The failed operation is not retried, leaving failing request not served. In Nova, there is a special retry mechanism for deadlocks, though I think it's more a hack than a proper fix. Neutron is one of the projects that suffer from those timeout errors a lot. Partly it's due to lack of discipline in how we do nested calls in l3_db and ml2_plugin code, but that's not something to change in foreseeable future, so we need to find another solution that is applicable for Juno. Ideally, the solution should be applicable for Icehouse too to allow distributors to resolve existing deadlocks without waiting for Juno. We've had several discussions and attempts to introduce a solution to the problem. Thanks to oslo.db guys, we now have more or less clear view on the cause of the failures and how to easily fix them. The solution is to switch mysqldb to something eventlet aware. The best candidate is probably MySQL Connector module that is an official MySQL client for Python and that shows some (preliminary) good results in terms of performance. I've made additional testing, creating 2000 networks in parallel (10 thread workers) for both drivers and comparing results. With mysqldb: 215.81 sec With mysql-connector: 88.66 ~2.4 times performance boost, ok? ;) I think we should switch to that library *even* if we forget about all the nasty deadlocks we experience now. I've posted a Neutron spec for the switch to the new client in Juno at [1]. Ideally, switch is just a matter of several fixes to oslo.db that would enable full support for the new driver already supported by SQLAlchemy, plus 'connection' string modified in service configuration files, plus documentation updates to refer to the new official way to configure services for MySQL. The database code won't, ideally, require any major changes, though some adaptation for the new client library may be needed. That said, Neutron does not seem to require any changes, though it was revealed that there are some alembic migration rules in Keystone or Glance that need (trivial) modifications. You can see how trivial the switch can be achieved for a service based on example for Neutron [2]. While this is a Neutron specific proposal, there is an obvious wish to switch to the new library globally throughout all the projects, to reduce devops burden, among other things. My vision is that, ideally, we switch all projects to the new library in Juno, though we still may leave several projects for K in case any issues arise, similar to the way projects switched to
Re: [openstack-dev] [Containers][Nova] Containers Team Mid-Cycle Meetup to join Nova Meetup
CORRECTION: This event happens July 28-31. Sorry for any confusion! Corrected Announcement: Containers Team, We have decided to hold our Mid-Cycle meetup along with the Nova Meetup in Beaverton, Oregon on July 28-31.The Nova Meetup is scheduled for July 28-30. https://www.eventbrite.com.au/e/openstack-nova-juno-mid-cycle-developer-meetup-tickets-11878128803 Those of us interested in Containers topic will use one of the breakout rooms generously offered by Intel. We will also stay on Thursday to focus on implementation plans and to engage with those members of the Nova Team who will be otherwise occupied on July 28-30, and will have a chance to focus entirely on Containers on the 31st. Please take a moment now to register using the link above, and I look forward to seeing you there. Thanks, Adrian Otto ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova][all] Old review expiration
I decided to do some Nova reviews today, and I decided to do it by pulling up the list of all of them and start from the end. What I've found is a *lot* of reviews that have been idle for several months. I've even found one or two that were approved, but weren't merged due to depending on an outdated patch, and I found others that had several +1s but no -1s or +2s. There are 2 or 3 pages of these old reviews hanging at the end of the list, and it made me ask about the auto-expiration we used to have—I missed it in the Gerrit upgrade thread, but it turns out that, since core reviewers can now abandon/restore patches, the auto-expiration has been turned off. Given that we have so many old reviews hanging around on nova (and probably other projects), should we consider setting something like that back up? With nova, at least, the vast majority of them can't possibly merge because they're so old, so we need to at least have something to remind the developer that they need to rebase…and if they've forgotten the review or don't care about it anymore, we should either have it taken over or get the review abandoned. The other concern I have is the several reviews that no core dev looked at in an entire month, but I have no solutions to suggest there, unfortunately :( -- Kevin L. Mitchell kevin.mitch...@rackspace.com Rackspace ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] Networks without subnets
FWIW, I believe TripleO will need this if we're going to be able to do https://blueprints.launchpad.net/tripleo/+spec/tripleo-on-openstack Being able to have instances without IPs assigned is basically required for that. -Ben On 07/11/2014 04:41 PM, Brent Eagles wrote: Hi, A bug titled Creating quantum L2 networks (without subnets) doesn't work as expected (https://bugs.launchpad.net/nova/+bug/1039665) was reported quite some time ago. Beyond the discussion in the bug report, there have been related bugs reported a few times. * https://bugs.launchpad.net/nova/+bug/1304409 * https://bugs.launchpad.net/nova/+bug/1252410 * https://bugs.launchpad.net/nova/+bug/1237711 * https://bugs.launchpad.net/nova/+bug/1311731 * https://bugs.launchpad.net/nova/+bug/1043827 BZs on this subject seem to have a hard time surviving. The get marked as incomplete or invalid, or in the related issues, the problem NOT related to the feature is addressed and the bug closed. We seem to dance around actually getting around to implementing this. The multiple reports show there *is* interest in this functionality but at the moment we are without an actual implementation. At the moment there are multiple related blueprints: * https://review.openstack.org/#/c/99873/ ML2 OVS: portsecurity extension support * https://review.openstack.org/#/c/106222/ Add Port Security Implementation in ML2 Plugin * https://review.openstack.org/#/c/97715 NFV unaddressed interfaces The first two blueprints, besides appearing to be very similar, propose implementing the port security extension currently employed by one of the neutron plugins. It is related to this issue as it allows a port to be configured indicating it does not want security groups to apply. This is relevant because without an address, a security group cannot be applied and this is treated as an error. Being able to specify skipping the security group criteria gets us a port on the network without an address, which is what happens when there is no subnet. The third approach is, on the face of it, related in that it proposes an interface without an address. However, on review it seems that the intent is not necessarily inline with the some of the BZs mentioned above. Indeed there is text that seems to pretty clearly state that it is not intended to cover the port-without-an-IP situation. As an aside, the title in the commit message in the review could use revising. In order to implement something that finally implements the functionality alluded to in the above BZs in Juno, we need to settle on a blueprint and direction. Barring the happy possiblity of a resolution beforehand, can this be made an agenda item in the next Nova and/or Neutron meetings? Cheers, Brent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ironic] July Reviewer review
Hi all! I skipped looking at our review stats last month - sorry about that. I'll try to do this more consistently at the beginning of each month, even if there's not much change. We're about at the middle of the cycle anyway, so now is a really good time to look back and see how the team activity has changed since the summit. Regarding the project's overall pace, compared to my last summary at end of May, I think there's been a noticeable improvement in review quality and pace. On the other hand, we've had a lot of specs proposed, and there was very little movement to review them for about a month, which frustrated some developers. I think there was good activity last week on spec reviews, but we could use more volunteers for that moving forward. Ping me if you're interested. Some highlights: - our average patches-per-changeset has gone down by 20% (4.2 - 3.4), while the average new changesets-per-day has gone up (5.1 - 6.3). I would like to think this is because the specs process is helping to improve the quality of our code submissions, and because we all spent some time focusing on improving stability and maintainability of the codebase. - we've merged almost 50% more changes in the last month than we did in May. Also, I'm delighted to say that I think a few people are ready to join the core review team. I'm starting a separate thread for each change, but the short version is I'd like to propose the following change: + jroll + Shrews - romcheg The current team roster can be found here: https://review.openstack.org/#/admin/groups/165 Without further ado, here are the 30 and 60 day stats, truncated at the one-review-per-day water mark. Reviews for the last 30 days in ironic ** -- ironic-core team member +--+---++ | Reviewer | Reviews -2 -1 +1 +2 +A+/- % | Disagreements* | +--+---++ | rloo ** | 105 20 43 2 40 1140.0% |6 ( 5.7%) | | lucasagomes ** | 1001 50 4 45 1349.0% |4 ( 4.0%) | | mrda | 900 13 77 0 085.6% | 21 ( 23.3%) | | devananda ** | 689 20 4 35 1657.4% |5 ( 7.4%) | | dtantsur **| 622 19 1 40 1866.1% |6 ( 9.7%) | | YuikoTakada| 580 10 48 0 082.8% | 14 ( 24.1%) | | whaom ** | 530 10 17 26 681.1% | 14 ( 26.4%) | | dshrews | 490 12 37 0 075.5% |9 ( 18.4%) | |jimrollenhagen| 470 11 21 15 876.6% |6 ( 12.8%) | | nobodycam ** | 451 15 2 27 564.4% |4 ( 8.9%) | |ghe.rivero| 300 4 26 0 086.7% |3 ( 10.0%) | Total reviews: 898 (29.9/day) Total reviewers: 47 (avg 0.6 reviews/day) Total reviews by core team: 531 (17.7/day) Core team size: 11 (avg 1.6 reviews/day) New patch sets in the last 30 days: 645 (21.5/day) Changes involved in the last 30 days: 188 (6.3/day) New changes in the last 30 days: 124 (4.1/day) Changes merged in the last 30 days: 89 (3.0/day) Changes abandoned in the last 30 days: 12 (0.4/day) Changes left in state WIP in the last 30 days: 0 (0.0/day) Queue growth in the last 30 days: 23 (0.8/day) Average number of patches per changeset: 3.4 Reviews for the last 60 days in ironic ** -- ironic-core team member +--+---++ | Reviewer | Reviews -2 -1 +1 +2 +A+/- % | Disagreements* | +--+---++ | mrda | 2570 36 221 0 086.0% | 46 ( 17.9%) | | lucasagomes ** | 2346 91 10 127 3258.5% | 14 ( 6.0%) | | rloo ** | 180 21 70 3 86 2549.4% |8 ( 4.4%) | | dtantsur **| 1774 75 29 69 2955.4% | 18 ( 10.2%) | | devananda ** | 131 21 46 6 58 2648.9% |8 ( 6.1%) | | whaom ** | 1180 22 24 72 1381.4% | 22 ( 18.6%) | |jimrollenhagen| 1120 26 60 26 976.8% | 15 ( 13.4%) | | YuikoTakada| 1040 17 87 0 083.7% | 26 ( 25.0%) | | dshrews | 880 13 75 0 085.2% | 14 ( 15.9%) | |ghe.rivero| 830 7 76 0 091.6% |8 ( 9.6%) | | yuriyz | 710 27 2 42 962.0% |3 ( 4.2%) | | nobodycam ** | 651 21 4 39 1266.2% |7 ( 10.8%) | Total reviews: 1976 (32.9/day) Total reviewers: 54 (avg 0.6 reviews/day) Total reviews by core team: 1117 (18.6/day)
[openstack-dev] [Ironic] Nominating David Shrewsbury to ironic-core
Hi all! While David (Shrews) only began working on Ironic in earnest four months ago, he has been working on some of the tougher problems with our Tempest coverage and the Nova-Ironic interactions. He's also become quite active in reviews and discussions on IRC, and demonstrated a good understanding of the challenges facing Ironic today. I believe he'll also make a great addition to the core team. Below are his stats for the last 90 days. Cheers, Devananda +--+---++ | Reviewer | Reviews -2 -1 +1 +2 +A+/- % | Disagreements* | +--+---++ 30 | dshrews | 470 11 36 0 076.6% |7 ( 14.9%) | 60 | dshrews | 910 14 77 0 084.6% | 15 ( 16.5%) | 90 | dshrews | 1210 21 100 0 082.6% | 16 ( 13.2%) | ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ironic] Nominating Jim Rollenhagen to ironic-core
Hi all! It's time to grow the team :) Jim (jroll) started working with Ironic at the last mid-cycle, when teeth became ironic-python-agent. In the time since then, he's jumped into Ironic to help improve the project as a whole. In the last few months, in both reviews and discussions on IRC, I have seen him consistently demonstrate a solid grasp of Ironic's architecture and its role within OpenStack, contribute meaningfully to design discussions, and help many other contributors. I think he will be a great addition to the core review team. Below are his review stats for Ironic, as calculated by the openstack-infra/reviewstats project with local modification to remove ironic-python-agent, so we can see his activity in the main project. Cheers, Devananda +--+---++ | Reviewer | Reviews -2 -1 +1 +2 +A+/- % | Disagreements* | +--+---++ 30 | jimrollenhagen | 290 8 21 0 072.4% |5 ( 17.2%) | 60 | jimrollenhagen | 760 16 60 0 078.9% | 13 ( 17.1%) | 90 | jimrollenhagen | 1060 27 79 0 074.5% | 25 ( 23.6%) | 180 | jimrollenhagen | 1570 41 116 0 073.9% | 35 ( 22.3%) | ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Ironic] proposal to remove Roman Prykhodchenko from the core team
While expanding the core review team is important, keeping the team's collective knowledge of the project consistent and accurate is equally important, and participation in reviews and design discussions is how we do that. Roman's (romcheg) review activity has been steadily dropping over the last six months, and even though he is working on the nova db - ironic db migration tool, I don't feel that he is active enough in code reviews or the design process, and has not been for some time. While I would welcome him back to the core team if his review activity were to pick up again, I feel it's best to remove him at this time. Below are his stats at over the last six months. Regards, Devananda +--+---++ | Reviewer | Reviews -2 -1 +1 +2 +A+/- % | Disagreements* | +--+---++ 30 | prykhodchenko | 180 6 2 10 266.7% |4 ( 22.2%) | 60 | prykhodchenko | 440 14 6 24 1068.2% |6 ( 13.6%) | 90 | prykhodchenko | 590 18 6 35 1469.5% |6 ( 10.2%) | 180 | prykhodchenko | 1503 46 21 80 2867.3% | 12 ( 8.0%) | ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Fuel] [OSTF] OSTF stops working after password is changed
I think showing this only upon failure is good if the user is also given the option to sore the credentials in the browser. That way, you only have to re-enter the credentials once if you want convenience, or do it every time if you want improved security. One downside would be that if you don¹t cache the credentials, you¹ll have to ³fail² the auth every time to be given the chance to re-enter the credentials. It may not be obvious that clicking ³run tests² will then let you enter new credentials. I was thinking that having a button you can press to enter the credentials would make it more obvious, but wouldn¹t reduce the number of clicks I.e. either run tests and fail or click ³Enter credentials² and enter new ones. The ³Enter credential² option would obviously be a little faster - David J. Easter Director of Product Management, Mirantis, Inc. From: Mike Scherbakov mscherba...@mirantis.com Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Date: Friday, July 11, 2014 at 2:36 PM To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Fuel] [OSTF] OSTF stops working after password is changed I'm wondering if we can show all these windows ONLY if there is authz failure with existing credentials from Nailgun. So the flow would be: user clicks on Run tests button, healthcheck tries to access OpenStack and fails. It shows up text fields to enter tenant/user/pass with the message similar to Default administrative credentials to OpenStack were changed since the deployment time. Please provide current credentials so HealthCheck can access OpenStack and run verification tests. I think it should be more obvious this way... Anyone, it must be a choice for a user, if he wants to store creds in a browser. On Fri, Jul 11, 2014 at 8:50 PM, Vitaly Kramskikh vkramsk...@mirantis.com wrote: Hi, In the current implementation we store provided credentials in browser local storage. What's your opinion on that? Maybe we shouldn't store new credentials at all even in browser? So users have to enter them manually every time they want to run OSTF. 2014-06-25 13:47 GMT+04:00 Dmitriy Shulyak dshul...@mirantis.com: It is possible to change everything so username, password and tenant fields Also this way we will be able to run tests not only as admin user On Wed, Jun 25, 2014 at 12:29 PM, Vitaly Kramskikh vkramsk...@mirantis.com wrote: Dmitry, Fields or field? Do we need to provide password only or other credentials are needed? 2014-06-25 13:02 GMT+04:00 Dmitriy Shulyak dshul...@mirantis.com: Looks like we will stick to #2 option, as most reliable one. - we have no way to know that openrc is changed, even if some scripts relies on it - ostf should not fail with auth error - we can create ostf user in post-deployment stage, but i heard that some ceilometer tests relied on admin user, also operator may not want to create additional user, for some reasons So, everybody is ok with additional fields on HealthCheck tab? On Fri, Jun 20, 2014 at 8:17 PM, Andrew Woodward xar...@gmail.com wrote: The openrc file has to be up to date for some of the HA scripts to work, we could just source that. On Fri, Jun 20, 2014 at 12:12 AM, Sergii Golovatiuk sgolovat...@mirantis.com wrote: +1 for #2. ~Sergii On Fri, Jun 20, 2014 at 1:21 AM, Andrey Danin ada...@mirantis.com wrote: +1 to Mike. Let the user provide actual credentials and use them in place. On Fri, Jun 20, 2014 at 2:01 AM, Mike Scherbakov mscherba...@mirantis.com wrote: I'm in favor of #2. I think users might not want to have their password stored in Fuel Master node. And if so, then it actually means we should not save it when user provides it on HealthCheck tab. On Thu, Jun 19, 2014 at 8:05 PM, Vitaly Kramskikh vkramsk...@mirantis.com wrote: Hi folks, We have a bug which prevents OSTF from working if user changes a password which was using for the initial installation. I skimmed through the comments and it seems there are 2 viable options: Create a separate user just for OSTF during OpenStack installation Provide a field for a password in UI so user could provide actual password in case it was changed What do you guys think? Which options is better? -- Vitaly Kramskikh, Software Engineer, Mirantis, Inc. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Mike Scherbakov #mihgen ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Andrey Danin ada...@mirantis.com skype: gcon.monolake
Re: [openstack-dev] [Ironic] Nominating David Shrewsbury to ironic-core
another +1 from /me. On Fri, Jul 11, 2014 at 3:50 PM, Devananda van der Veen devananda@gmail.com wrote: Hi all! While David (Shrews) only began working on Ironic in earnest four months ago, he has been working on some of the tougher problems with our Tempest coverage and the Nova-Ironic interactions. He's also become quite active in reviews and discussions on IRC, and demonstrated a good understanding of the challenges facing Ironic today. I believe he'll also make a great addition to the core team. Below are his stats for the last 90 days. Cheers, Devananda +--+---++ | Reviewer | Reviews -2 -1 +1 +2 +A+/- % | Disagreements* | +--+---++ 30 | dshrews | 470 11 36 0 076.6% |7 ( 14.9%) | 60 | dshrews | 910 14 77 0 084.6% | 15 ( 16.5%) | 90 | dshrews | 1210 21 100 0 082.6% | 16 ( 13.2%) | ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] Nominating Jim Rollenhagen to ironic-core
+1 from me. On Fri, Jul 11, 2014 at 3:50 PM, Devananda van der Veen devananda@gmail.com wrote: Hi all! It's time to grow the team :) Jim (jroll) started working with Ironic at the last mid-cycle, when teeth became ironic-python-agent. In the time since then, he's jumped into Ironic to help improve the project as a whole. In the last few months, in both reviews and discussions on IRC, I have seen him consistently demonstrate a solid grasp of Ironic's architecture and its role within OpenStack, contribute meaningfully to design discussions, and help many other contributors. I think he will be a great addition to the core review team. Below are his review stats for Ironic, as calculated by the openstack-infra/reviewstats project with local modification to remove ironic-python-agent, so we can see his activity in the main project. Cheers, Devananda +--+---++ | Reviewer | Reviews -2 -1 +1 +2 +A+/- % | Disagreements* | +--+---++ 30 | jimrollenhagen | 290 8 21 0 072.4% |5 ( 17.2%) | 60 | jimrollenhagen | 760 16 60 0 078.9% | 13 ( 17.1%) | 90 | jimrollenhagen | 1060 27 79 0 074.5% | 25 ( 23.6%) | 180 | jimrollenhagen | 1570 41 116 0 073.9% | 35 ( 22.3%) | ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client
On Jul 11, 2014 5:32 PM, Vishvananda Ishaya vishvana...@gmail.com wrote: I have tried using pymysql in place of mysqldb and in real world concurrency tests against cinder and nova it performs slower. I was inspired by the mention of mysql-connector so I just tried that option instead. Mysql-connector seems to be slightly slower as well, which leads me to believe that the blocking inside of Do you have some numbers? Seems to be slightly slower doesn't really stand up as an argument against the numbers that have been posted in this thread. sqlalchemy is not the main bottleneck across projects. Vish P.S. The performanace in all cases was abysmal, so performance work definitely needs to be done, but just the guess that replacing our mysql library is going to solve all of our performance problems appears to be incorrect at first blush. The motivation is still mostly deadlock relief but more performance work should be done. I agree with you there. I'm still hopeful for some improvement from this. On Jul 11, 2014, at 10:20 AM, Clark Boylan clark.boy...@gmail.com wrote: Before we get too far ahead of ourselves mysql-connector is not hosted on pypi. Instead it is an external package link. We recently managed to remove all packages that are hosted as external package links from openstack and will not add new ones in. Before we can use mysql-connector in the gate oracle will need to publish mysql-connector on pypi properly. That said there is at least one other pure python alternative, PyMySQL. PyMySQL supports py3k and pypy. We should look at using PyMySQL instead if we want to start with a reasonable path to getting this in the gate. Clark On Fri, Jul 11, 2014 at 10:07 AM, Miguel Angel Ajo Pelayo mangel...@redhat.com wrote: +1 here too, Amazed with the performance gains, x2.4 seems a lot, and we'd get rid of deadlocks. - Original Message - +1 I'm pretty excited about the possibilities here. I've had this mysqldb/eventlet contention in the back of my mind for some time now. I'm glad to see some work being done in this area. Carl On Fri, Jul 11, 2014 at 7:04 AM, Ihar Hrachyshka ihrac...@redhat.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 09/07/14 13:17, Ihar Hrachyshka wrote: Hi all, Multiple projects are suffering from db lock timeouts due to deadlocks deep in mysqldb library that we use to interact with mysql servers. In essence, the problem is due to missing eventlet support in mysqldb module, meaning when a db lock is encountered, the library does not yield to the next green thread, allowing other threads to eventually unlock the grabbed lock, and instead it just blocks the main thread, that eventually raises timeout exception (OperationalError). The failed operation is not retried, leaving failing request not served. In Nova, there is a special retry mechanism for deadlocks, though I think it's more a hack than a proper fix. Neutron is one of the projects that suffer from those timeout errors a lot. Partly it's due to lack of discipline in how we do nested calls in l3_db and ml2_plugin code, but that's not something to change in foreseeable future, so we need to find another solution that is applicable for Juno. Ideally, the solution should be applicable for Icehouse too to allow distributors to resolve existing deadlocks without waiting for Juno. We've had several discussions and attempts to introduce a solution to the problem. Thanks to oslo.db guys, we now have more or less clear view on the cause of the failures and how to easily fix them. The solution is to switch mysqldb to something eventlet aware. The best candidate is probably MySQL Connector module that is an official MySQL client for Python and that shows some (preliminary) good results in terms of performance. I've made additional testing, creating 2000 networks in parallel (10 thread workers) for both drivers and comparing results. With mysqldb: 215.81 sec With mysql-connector: 88.66 ~2.4 times performance boost, ok? ;) I think we should switch to that library *even* if we forget about all the nasty deadlocks we experience now. I've posted a Neutron spec for the switch to the new client in Juno at [1]. Ideally, switch is just a matter of several fixes to oslo.db that would enable full support for the new driver already supported by SQLAlchemy, plus 'connection' string modified in service configuration files, plus documentation updates to refer to the new official way to configure services for MySQL. The database code won't, ideally, require any major changes, though some adaptation for the new client library may be needed. That said, Neutron does not seem to require any changes, though it was revealed that there are some alembic migration rules in Keystone or Glance that need (trivial) modifications. You can see how
Re: [openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client
Clark, You make a good point. It's there some resistance to this or is it just a matter of asking? Carl On Jul 11, 2014 12:23 PM, Clark Boylan clark.boy...@gmail.com wrote: Before we get too far ahead of ourselves mysql-connector is not hosted on pypi. Instead it is an external package link. We recently managed to remove all packages that are hosted as external package links from openstack and will not add new ones in. Before we can use mysql-connector in the gate oracle will need to publish mysql-connector on pypi properly. That said there is at least one other pure python alternative, PyMySQL. PyMySQL supports py3k and pypy. We should look at using PyMySQL instead if we want to start with a reasonable path to getting this in the gate. Clark On Fri, Jul 11, 2014 at 10:07 AM, Miguel Angel Ajo Pelayo mangel...@redhat.com wrote: +1 here too, Amazed with the performance gains, x2.4 seems a lot, and we'd get rid of deadlocks. - Original Message - +1 I'm pretty excited about the possibilities here. I've had this mysqldb/eventlet contention in the back of my mind for some time now. I'm glad to see some work being done in this area. Carl On Fri, Jul 11, 2014 at 7:04 AM, Ihar Hrachyshka ihrac...@redhat.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 09/07/14 13:17, Ihar Hrachyshka wrote: Hi all, Multiple projects are suffering from db lock timeouts due to deadlocks deep in mysqldb library that we use to interact with mysql servers. In essence, the problem is due to missing eventlet support in mysqldb module, meaning when a db lock is encountered, the library does not yield to the next green thread, allowing other threads to eventually unlock the grabbed lock, and instead it just blocks the main thread, that eventually raises timeout exception (OperationalError). The failed operation is not retried, leaving failing request not served. In Nova, there is a special retry mechanism for deadlocks, though I think it's more a hack than a proper fix. Neutron is one of the projects that suffer from those timeout errors a lot. Partly it's due to lack of discipline in how we do nested calls in l3_db and ml2_plugin code, but that's not something to change in foreseeable future, so we need to find another solution that is applicable for Juno. Ideally, the solution should be applicable for Icehouse too to allow distributors to resolve existing deadlocks without waiting for Juno. We've had several discussions and attempts to introduce a solution to the problem. Thanks to oslo.db guys, we now have more or less clear view on the cause of the failures and how to easily fix them. The solution is to switch mysqldb to something eventlet aware. The best candidate is probably MySQL Connector module that is an official MySQL client for Python and that shows some (preliminary) good results in terms of performance. I've made additional testing, creating 2000 networks in parallel (10 thread workers) for both drivers and comparing results. With mysqldb: 215.81 sec With mysql-connector: 88.66 ~2.4 times performance boost, ok? ;) I think we should switch to that library *even* if we forget about all the nasty deadlocks we experience now. I've posted a Neutron spec for the switch to the new client in Juno at [1]. Ideally, switch is just a matter of several fixes to oslo.db that would enable full support for the new driver already supported by SQLAlchemy, plus 'connection' string modified in service configuration files, plus documentation updates to refer to the new official way to configure services for MySQL. The database code won't, ideally, require any major changes, though some adaptation for the new client library may be needed. That said, Neutron does not seem to require any changes, though it was revealed that there are some alembic migration rules in Keystone or Glance that need (trivial) modifications. You can see how trivial the switch can be achieved for a service based on example for Neutron [2]. While this is a Neutron specific proposal, there is an obvious wish to switch to the new library globally throughout all the projects, to reduce devops burden, among other things. My vision is that, ideally, we switch all projects to the new library in Juno, though we still may leave several projects for K in case any issues arise, similar to the way projects switched to oslo.messaging during two cycles instead of one. Though looking at how easy Neutron can be switched to the new library, I wouldn't expect any issues that would postpone the switch till K. It was mentioned in comments to the spec proposal that there were some discussions at the latest summit around possible switch in context of Nova that revealed some concerns, though they do
Re: [openstack-dev] [Ironic] Nominating David Shrewsbury to ironic-core
Unclear if I get a vote, but if so, +1 it is. :) On July 11, 2014 4:18:55 PM PDT, Chris K nobody...@gmail.com wrote: another +1 from /me. On Fri, Jul 11, 2014 at 3:50 PM, Devananda van der Veen devananda@gmail.com wrote: Hi all! While David (Shrews) only began working on Ironic in earnest four months ago, he has been working on some of the tougher problems with our Tempest coverage and the Nova-Ironic interactions. He's also become quite active in reviews and discussions on IRC, and demonstrated a good understanding of the challenges facing Ironic today. I believe he'll also make a great addition to the core team. Below are his stats for the last 90 days. Cheers, Devananda +--+---++ | Reviewer | Reviews -2 -1 +1 +2 +A+/- % | Disagreements* | +--+---++ 30 | dshrews | 470 11 36 0 076.6% | 7 ( 14.9%) | 60 | dshrews | 910 14 77 0 084.6% | 15 ( 16.5%) | 90 | dshrews | 1210 21 100 0 082.6% | 16 ( 13.2%) | ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev // jim___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] [third-party] Update on third party CI in Neutron
On Fri, Jul 11, 2014 at 8:56 AM, Kyle Mestery mest...@noironetworks.com wrote: 1. PLUMgrid 1. Not saving enough logs All Jenkins slaves were just updated to upload all required logs. PLUMgrid CI should be good now. Thanks, Fawad Khaliq ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client
On 07/11/2014 08:04 AM, Ihar Hrachyshka wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 09/07/14 13:17, Ihar Hrachyshka wrote: Hi all, Multiple projects are suffering from db lock timeouts due to deadlocks deep in mysqldb library that we use to interact with mysql servers. In essence, the problem is due to missing eventlet support in mysqldb module, meaning when a db lock is encountered, the library does not yield to the next green thread, allowing other threads to eventually unlock the grabbed lock, and instead it just blocks the main thread, that eventually raises timeout exception (OperationalError). The failed operation is not retried, leaving failing request not served. In Nova, there is a special retry mechanism for deadlocks, though I think it's more a hack than a proper fix. Neutron is one of the projects that suffer from those timeout errors a lot. Partly it's due to lack of discipline in how we do nested calls in l3_db and ml2_plugin code, but that's not something to change in foreseeable future, so we need to find another solution that is applicable for Juno. Ideally, the solution should be applicable for Icehouse too to allow distributors to resolve existing deadlocks without waiting for Juno. We've had several discussions and attempts to introduce a solution to the problem. Thanks to oslo.db guys, we now have more or less clear view on the cause of the failures and how to easily fix them. The solution is to switch mysqldb to something eventlet aware. The best candidate is probably MySQL Connector module that is an official MySQL client for Python and that shows some (preliminary) good results in terms of performance. I've made additional testing, creating 2000 networks in parallel (10 thread workers) for both drivers and comparing results. With mysqldb: 215.81 sec With mysql-connector: 88.66 ~2.4 times performance boost, ok? ;) That really doesn't tell me much. Please remember that performance != scalability. If you showed the test/benchmark code, that would be great. You need to run your benchmarks at varying levels of concurrency and varying levels of read/write ratios for the workers. Otherwise it's like looking at a a single dot of paint on a painting. Without looking at the patterns of throughput (performance) and concurrency/locking (scalability) with various levels of workers and read/write ratios, you miss the whole picture. Another thing to ensure is that you are using real *processes*, not threads, so that you actually simulate a real OpenStack service like Nova or Neutron, which are multi-plexed, not multi-threaded, and have a greenlet pool within each worker process. Best -jay I think we should switch to that library *even* if we forget about all the nasty deadlocks we experience now. I've posted a Neutron spec for the switch to the new client in Juno at [1]. Ideally, switch is just a matter of several fixes to oslo.db that would enable full support for the new driver already supported by SQLAlchemy, plus 'connection' string modified in service configuration files, plus documentation updates to refer to the new official way to configure services for MySQL. The database code won't, ideally, require any major changes, though some adaptation for the new client library may be needed. That said, Neutron does not seem to require any changes, though it was revealed that there are some alembic migration rules in Keystone or Glance that need (trivial) modifications. You can see how trivial the switch can be achieved for a service based on example for Neutron [2]. While this is a Neutron specific proposal, there is an obvious wish to switch to the new library globally throughout all the projects, to reduce devops burden, among other things. My vision is that, ideally, we switch all projects to the new library in Juno, though we still may leave several projects for K in case any issues arise, similar to the way projects switched to oslo.messaging during two cycles instead of one. Though looking at how easy Neutron can be switched to the new library, I wouldn't expect any issues that would postpone the switch till K. It was mentioned in comments to the spec proposal that there were some discussions at the latest summit around possible switch in context of Nova that revealed some concerns, though they do not seem to be documented anywhere. So if you know anything about it, please comment. So, we'd like to hear from other projects what's your take on that move, whether you see any issues or have concerns about it. Thanks for your comments, /Ihar [1]: https://review.openstack.org/#/c/104905/ [2]: https://review.openstack.org/#/c/105209/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
Re: [openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client
On 7/11/14, 7:26 PM, Carl Baldwin wrote: On Jul 11, 2014 5:32 PM, Vishvananda Ishaya vishvana...@gmail.com mailto:vishvana...@gmail.com wrote: I have tried using pymysql in place of mysqldb and in real world concurrency tests against cinder and nova it performs slower. I was inspired by the mention of mysql-connector so I just tried that option instead. Mysql-connector seems to be slightly slower as well, which leads me to believe that the blocking inside of Do you have some numbers? Seems to be slightly slower doesn't really stand up as an argument against the numbers that have been posted in this thread. sqlalchemy is not the main bottleneck across projects. Vish P.S. The performanace in all cases was abysmal, so performance work definitely needs to be done, but just the guess that replacing our mysql library is going to solve all of our performance problems appears to be incorrect at first blush. The motivation is still mostly deadlock relief but more performance work should be done. I agree with you there. I'm still hopeful for some improvement from this. To identify performance that's alleviated by async you have to establish up front that IO blocking is the issue, which would entail having code that's blazing fast until you start running it against concurrent connections, at which point you can identify via profiling that IO operations are being serialized. This is a very specific issue. In contrast, to identify why some arbitrary openstack app is slow, my bet is that async is often not the big issue. Every day I look at openstack code and talk to people working on things, I see many performance issues that have nothing to do with concurrency, and as I detailed in my wiki page at https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy there is a long road to cleaning up all the excessive queries, hundreds of unnecessary rows and columns being pulled over the network, unindexed lookups, subquery joins, hammering of Python-intensive operations (often due to the nature of OS apps as lots and lots of tiny API calls) that can be cached. There's a clear path to tons better performance documented there and most of it is not about async - which means that successful async isn't going to solve all those issues. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Live migration objects
Dan, thank you for your reply. Regarding the following: However, for your immediate bug, I say just have the compute host abandon the instance if the database says its host != self.host, and otherwise maybe just return it to a running state. could you clarify what you mean by abandon? Would putting the instance in Error state be appropriate or did you have something else in mind? Thanks, John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Containers][Nova] Containers Team Mid-Cycle Meetup to join Nova Meetup
You should probably add these details to the wiki page for the event at https://wiki.openstack.org/wiki/Sprints/BeavertonJunoSprint Unfortunately my travel is booked already, so I wont be there for the Thursday. Michael On Sat, Jul 12, 2014 at 8:31 AM, Adrian Otto adrian.o...@rackspace.com wrote: CORRECTION: This event happens July 28-31. Sorry for any confusion! Corrected Announcement: Containers Team, We have decided to hold our Mid-Cycle meetup along with the Nova Meetup in Beaverton, Oregon on July 28-31.The Nova Meetup is scheduled for July 28-30. https://www.eventbrite.com.au/e/openstack-nova-juno-mid-cycle-developer-meetup-tickets-11878128803 Those of us interested in Containers topic will use one of the breakout rooms generously offered by Intel. We will also stay on Thursday to focus on implementation plans and to engage with those members of the Nova Team who will be otherwise occupied on July 28-30, and will have a chance to focus entirely on Containers on the 31st. Please take a moment now to register using the link above, and I look forward to seeing you there. Thanks, Adrian Otto ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client
On 07/11/2014 09:17 PM, Mike Bayer wrote: ... To identify performance that's alleviated by async you have to establish up front that IO blocking is the issue, which would entail having code that's blazing fast until you start running it against concurrent connections, at which point you can identify via profiling that IO operations are being serialized. This is a very specific issue. In contrast, to identify why some arbitrary openstack app is slow, my bet is that async is often not the big issue. Every day I look at openstack code and talk to people working on things, I see many performance issues that have nothing to do with concurrency, and as I detailed in my wiki page at https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy there is a long road to cleaning up all the excessive queries, hundreds of unnecessary rows and columns being pulled over the network, unindexed lookups, subquery joins, hammering of Python-intensive operations (often due to the nature of OS apps as lots and lots of tiny API calls) that can be cached. There's a clear path to tons better performance documented there and most of it is not about async - which means that successful async isn't going to solve all those issues. Yep, couldn't agree more. Frankly, the steps you outline in the wiki above are excellent examples of where we can make significant gains in both performance and scalability. In addition to those you listed, the underlying database schemas themselves, with the excessive use of large VARCHAR fields, BLOB fields for JSONified values, and the general bad strategy of bunching heavily-read fields with infrequently-read fields in the same tables, are also a source of poor overall database performance. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client
On 7/11/14, 11:26 PM, Jay Pipes wrote: Yep, couldn't agree more. Frankly, the steps you outline in the wiki above are excellent examples of where we can make significant gains in both performance and scalability. In addition to those you listed, the underlying database schemas themselves, with the excessive use of large VARCHAR fields, BLOB fields for JSONified values, and the general bad strategy of bunching heavily-read fields with infrequently-read fields in the same tables, are also a source of poor overall database performance. Well the topic of schema modifications I actually left out of that document entirely for starters - I made a conscious choice to focus entirely on things that don't involve any apps changing any of their fundamental approaches or schemas...at least just yet! :)I'm hoping that as oslo.db improves and the patterns start to roll out, we can start working on schema design too.Because yeah I've seen the giant lists of VARCHAR everything and just said, OK well we're going to have to get to that..just not right now :). ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Gantt] Scheduler split status (updated)
On 07/11/2014 07:14 AM, John Garbutt wrote: On 10 July 2014 16:59, Sylvain Bauza sba...@redhat.com wrote: Le 10/07/2014 15:47, Russell Bryant a écrit : On 07/10/2014 05:06 AM, Sylvain Bauza wrote: Hi all, === tl;dr: Now that we agree on waiting for the split prereqs to be done, we debate on if ResourceTracker should be part of the scheduler code and consequently Scheduler should expose ResourceTracker APIs so that Nova wouldn't own compute nodes resources. I'm proposing to first come with RT as Nova resource in Juno and move ResourceTracker in Scheduler for K, so we at least merge some patches by Juno. === Some debates occured recently about the scheduler split, so I think it's important to loop back with you all to see where we are and what are the discussions. Again, feel free to express your opinions, they are welcome. Where did this resource tracker discussion come up? Do you have any references that I can read to catch up on it? I would like to see more detail on the proposal for what should stay in Nova vs. be moved. What is the interface between Nova and the scheduler here? Oh, missed the most important question you asked. So, about the interface in between scheduler and Nova, the original agreed proposal is in the spec https://review.openstack.org/82133 (approved) where the Scheduler exposes : - select_destinations() : for querying the scheduler to provide candidates - update_resource_stats() : for updating the scheduler internal state (ie. HostState) Here, update_resource_stats() is called by the ResourceTracker, see the implementations (in review) https://review.openstack.org/82778 and https://review.openstack.org/104556. The alternative that has just been raised this week is to provide a new interface where ComputeNode claims for resources and frees these resources, so that all the resources are fully owned by the Scheduler. An initial PoC has been raised here https://review.openstack.org/103598 but I tried to see what would be a ResourceTracker proxified by a Scheduler client here : https://review.openstack.org/105747. As the spec hasn't been written, the names of the interfaces are not properly defined but I made a proposal as : - select_destinations() : same as above - usage_claim() : claim a resource amount - usage_update() : update a resource amount - usage_drop(): frees the resource amount Again, this is a dummy proposal, a spec has to written if we consider moving the RT. While I am not against moving the resource tracker, I feel we could move this to Gantt after the core scheduling has been moved. Big -1 from me on this, John. Frankly, I see no urgency whatsoever -- and actually very little benefit -- to moving the scheduler out of Nova. The Gantt project I think is getting ahead of itself by focusing on a split instead of focusing on cleaning up the interfaces between nova-conductor, nova-scheduler, and nova-compute. I see little to no long-term benefit in splitting the scheduler -- especially with its current design -- out from Nova. It's not like Neutron or Cinder, where the split-out service is providing management of a particular kind of resource (network, block storage). The Gantt project isn't providing any resource itself. Instead, it would be acting as a proxy for the placement other services' resources, which, IMO, in and of itself, is not a reason to go through the trouble of splitting the scheduler out of Nova. I was imagining the extensible resource tracker to become (sort of) equivalent to cinder volume drivers. The problem with the extensible resource tracker design is that it, again, just shoves a bunch of stuff into a JSON field and both the existing resource tracker code (in nova-compute) as well as the scheduler code (in nova-scheduler) then need to use and abuse this BLOB of random data. I tried to make my point on the extensible resource tracker blueprint about my objections to the design. My first, and main, objection is that there was never demonstrated ANY clear use case for the extensibility of resources that was not already covered by the existing resource tracker and scheduler. The only use case was a vague we have out of tree custom plugins that depend on divergent behaviour and therefore we need a plugin point to change the way the scheduler thinks of a particular resource. And that is not a use case. It's simply a request to break compatibility between clouds with out-of-tree source code. My second objection was that the proposed implementation added yet more completely unnecessary complication to the scheduler, with the introduction of scheduler consumers, one for each extensible resource added as plugins to the resource tracker. The existing scheduler is already displaying a silly amount of needless complexity, and current (and approved) blueprints like this one merely add to the endless array of configuration options and ostensibly flexible behaviour. The problem with ALL of these approved and in-review specs, IMO, is
Re: [openstack-dev] [Heat] [TripleO] Extended get_attr support for ResourceGroup
On 11/07/14 09:37, Tomas Sedovic wrote: Hi all, This is a follow-up to Clint Byrum's suggestion to add the `Map` intrinsic function[0], Zane Bitter's response[1] and Randall Burt's addendum[2]. Sorry for bringing it up again, but I'd love to reach consensus on this. The summary of the previous conversation: Please keep bringing it up until you get a straight answer ;) 1. TripleO is using some functionality currently not supported by Heat around scaled-out resources 2. Clint proposed a `map` intrinsic function that would solve it 3. Zane said Heat have historically been against a for-loop functionality 4. Randall suggested ResourceGroup's attribute passthrough may do what we need I've looked at the ResourceGroup code and experimented a bit. It does do some of what TripleO needs but not all. Many thanks for putting this together Tomas, this is exactly the kind of information that is _incredibly_ helpful in knowing what sort of features we need in HOT. Fantastic work :) Here's what we're doing with our scaled-out resources (what we'd like to wrap in a ResourceGroup or similar in the future): 1. Building a coma-separated list of RabbitMQ nodes: https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L642 This one is easy with ResourceGroup's inner attribute support: list_join: - , - {get_attr: [controller_group, name]} (controller_group is a ResourceGroup of Nova servers) 2. Get the name of the first Controller node: https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L339 Possible today: {get_attr: [controller_group, resource.0.name]} 3. List of IP addresses of all controllers: https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L405 We cannot do this, because resource group doesn't support extended attributes. Would need something like: {get_attr: [controller_group, networks, ctlplane, 0]} (ctlplane is the network controller_group servers are on) I was going to give an explanation of how we could implement this, but then I realised a patch was going to be easier: https://review.openstack.org/#/c/106541/ https://review.openstack.org/#/c/106542/ 4. IP address of the first node in the resource group: https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/swift-deploy.yaml#L29 Can't do: extended attributes are not supported for the n-th node for the group either. I believe this is possible today using: {get_attr: [controller_group, resource.0.networks, ctlplane, 0]} This can be solved by `get_resource` working with resource IDs: get_attr: - {get_attr: [controller_group, resource.0]} - [networks, ctlplane, 0] (i.e. we get the server's ID from the ResourceGroup and change `get_attr` to work with the ID's too. Would also work if `get_resource` understood IDs). This is never going to happen. Think of get_resource as returning an object whose string representation is the UUID of the named resource (get_attr is similar, but returning attributes instead). It doesn't mean that having the UUID of a resource is the same as having the resource itself; the UUID could have come from anywhere. What you're talking about is a radical departure from the existing, very simple but extremely effective, model toward something that's extremely difficult to analyse with lots of nasty edge cases. It's common for people to think they want this, but it always turns out there's a better way to achieve their goal within the existing data model. Alternatively, we could extend the ResourceGroup's get_attr behaviour: {get_attr: [controller_group, resource.0.networks.ctlplane.0]} but the former is a bit cleaner and more generic. I wrote a patch that implements this (and also handles (3) above in a similar manner), but in the end I decided that this: {get_attr: [controller_group, resource.0, networks, ctlplane, 0]} would be better than either that or the current syntax (which was obviously obscure enough that you didn't discover it). My only reservation was that it might make things a little weird when we have an autoscaling API to get attributes from compared with the dotted syntax that you suggest, but I soon got over it ;) --- That was the easy stuff, where we can get by with the current functionality (plus a few fixes). What follows are examples that really need new intrinsic functions (or seriously complicating the ResourceGroup attribute code and syntax). 5. Building a list of {ip: ..., name: ...} dictionaries to configure haproxy: https://github.com/openstack/tripleo-heat-templates/blob/a7f2a2c928e9c78a18defb68feb40da8c7eb95d6/overcloud-source.yaml#L478 This really calls for a mapping/for-each kind of functionality. Trying to invent a ResourceGroup syntax for this