[openstack-dev] [nova] python-novaclient: uses deprecated keyring.backend.$keyring
Hi, Someone sent a bug report against the python-novaclient package: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=728470 Could someone take care of this? FYI, the patch attached to the bug report seems wrong, according to mitya57 in #debian-python (in OFTC), though the problem is real and needs to be addressed, and I don't have the time to investigate it myself right now. Cheers, Thomas Goirand (zigo) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] tenant or project
+2 This kind of confusion is actually very bad from an external perspective and from a user perspective. Should this just get resolved by the TC once and for all? I remember this same project vs tenant question happening like 2 years ago (maybe less) and it makes us all look sort if mad if we are having it again (especially since it impacts so many components clients and code, code comments, docs...). Sent from my really tiny device... On Nov 23, 2013, at 10:52 AM, Tim Bell tim.b...@cern.chmailto:tim.b...@cern.ch wrote: To be clear, I don’t care Tenant vs Project. However, I do care that we should not continue this confusion. One or the other… but not both and a plan to depreciate the other. Naturally, at least 1 release backwards compatibility for environment variables or APIs. Tim From: Dean Troyer [mailto:dtro...@gmail.com] Sent: 23 November 2013 19:03 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] tenant or project On Sat, Nov 23, 2013 at 11:35 AM, Dolph Mathews dolph.math...@gmail.commailto:dolph.math...@gmail.com wrote: +1 for using the term project across all services. Projects provide multi-tenant isolation for resources across the cloud. Part of the reason we prefer projects in keystone is that domains conceptually provide multi-tenant isolation within keystone itself, so the overloaded tenant terminology gets really confusing. - keystoneclient already supports projects from a library perspective (including auth_token) Thanks you! I will eventually be able to remove my disparaging comments and work-arounds in OSC for tenantId vs tenant_id!!! - keystoneclient's CLI is deprecated in favor of openstackclient's CLI, which supports the project terminology if you pass the --identity-api-version=3 flag FWIW I followed Horizon's lead in OSC and removed the tern 'tenant' from all user-visible parts, except for the compatability OS_TENAMT_{ID,NAME} variables and --os-tenant-{id,name} options. Neither of those is documented anywhere though. This includes commands for all OS APIs it supports. dt -- Dean Troyer dtro...@gmail.commailto:dtro...@gmail.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Solum] git-Integration working group
Excerpts from Adrian Otto's message of 2013-11-22 18:51:16 -0800: Monty, On Nov 22, 2013, at 6:24 PM, Monty Taylor mord...@inaugust.com wrote: On 11/22/2013 05:37 PM, Krishna Raman wrote: Hello all, I would like to kickoff the Git integration discussion. Goal of this subgroup is to go through the git-integration blueprint [1] and break it up into smaller blueprints that we can execute on. We have to consider 2 workflows: 1) For Milestone 1, pull based git workflow where user uses a public git repository (possibly on github) to trigger the build 2) For later milestones, a push based workflow where the git repository is maintained by Solum Hi! Hi, thanks for chiming in here. I'm a little disappointed that we've decided to base the initial workflow on something that is not related to the world-class git-based developer tooling that the OpenStack project has already produced. We have a GIANT amount of tooling in this space, and it's all quite scalable. There is also the intent by 3 or 4 different groups to make it more re-usable/re-consumable, including thoughts in making sure that we can drive it from and have it consume heat. The initial work will be something pretty trivial. It's just a web hook on a git push. The workflow in this case is not customizable, and has basically no features. The intent is to iterate on this to make it much more compelling over time, soon after the minimum integration, we will put a real workflow system in place. We did discuss Zuul and Nodepool, and nobody had any objection to learning more about those. This might be a bit early in our roadmap to be pulling them in, but if there is an easy way to use them early in our development, we'd like to explore that. Zuul and nodepool are things to optimize large scale testing. git-review and gerrit, on the other hand, are the frontend that it sounds like this trivial just a push process would try to replace. I don't think it is wise to ignore the success of those two pieces of the OpenStack infrastructure. If what you're doing is only ever going to be a simple push, so be it. However, I doubt that it will remain so simple. Is there some reason _not_ to just consume these as-is? Devdatta has created 2 blueprints for consideration: [2] [3] I have set up a doodle to poll for a /recurring/ meeting time for this subgroup: http://doodle.com/7wypkzqe9wep3d33#table (Timezone support is enabled) Currently the plan is to try G+ hangouts to run this meetings and scribe on #solum. This will limit us to a max of 10 participants. If we have more interest, we will need to see how to change the meetings. We have IRC meeting channels for meetings. They are logged - and they have the benefit that they do not require non-Open Source software to access. If you have them in IRC, the team from OpenStack who is already working on developer workflow around git can potentially participate. I don't mean to be negative, but if you want to be a PaaS for OpenStack, I would strongly consider not using G+ when we have IRC, and I would strongly suggest engaging with the Infra projects that already know how to do git-based workflow and action triggering. We just finished holding the Solum Community Design Workshop in San Francisco. We had both irc and G+ in addition to etherpad for shared notetaking. What we found is that that collaboration was faster and more effective when we used the G+ tool. The remote participants had a strong preference for it, and requested that we use it for the breakout meetings as well. The breakout design meetings will have a scribe who will transcribe the interaction in IRC so it will also be logged. We struggled with this in Ubuntu as well. Ultimately, our fine friends at Google have created what seems to be one of the most intuitive distributed collaboration tools the world has ever seen. I think Monty is right, and that we should strive to use the tools the rest of OpenStack uses whenever possible, and we should strive to be 100% self hosted. However, I do think G+ has enough benefit at times to deal with the fact that it is not free and we can't really host our own. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Mission Statement
On 11/18/2013 09:26 PM, Russell Bryant wrote: On 11/18/2013 09:10 PM, Stuart Fox wrote: Hey * Does ironic fall under the Compute banner? If so, the statement needs a little tweek. No. On 2013-11-18 5:55 PM, Russell Bryant rbry...@redhat.com mailto:rbry...@redhat.com wrote: [1] http://git.openstack.org/cgit/openstack/governance/tree/reference/programs.yaml There is a separate bare metal program: Right - but the end user still interacts with that via nova - so even if nova doesn't do the pxe/impi calls, Ironic will still have a nova driver. I bring this up because it seems to me that you should either remove the list of examples of compute resources, or change including to be including but not limited to Bare metal: codename: Ironic ptl: Devananda van der Veen (devananda) mission: To produce an OpenStack service and associated python libraries capable of managing and provisioning physical machines, and to do this in a security-aware and fault-tolerant manner. url: https://wiki.openstack.org/wiki/Ironic ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Infra] Support for PCI Passthrough
On 23 November 2013 08:43, Jeremy Stanley fu...@yuggoth.org wrote: On 2013-11-22 08:59:16 + (+), Tan, Lin wrote: [...] In the near term, your best bet is to run your own test infrastructure supporting the hardware features you require and report advisory results back on proposed changes: http://ci.openstack.org/third_party.html For a longer term solution, you may want to consult with the TripleO project with regards to their bare-metal test plan: https://wiki.openstack.org/wiki/TripleO/TripleOCloud I think using the donated resources to perform this sort of testing is an ideal example of the value the TripleO cloud can bring to OpenStack as a whole. I don't know if we have the necessary hardware (I'm fairly sure we have VT-d, but I'm not 100% sure we have anything setup for SR-IOV. If we do, then cool - please come and work with us to get that testing what you need. A key consideration will be whether you want checking or gating. For gating or infra run checking there need to be two regions (which the TripleO cloud is aiming at) and infra running the tests; for checking without infra running it the third-party system is a good mechanism (and that can be run from a single TripleO region too, in principle. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Schduler] Volunteers wanted for a modest proposal for an external scheduler in our lifetime
I think everyone agrees we need a unified scheduler. Boris' approach is to move the storage to memcache so the DB is no longer part of the picture, and then move the scheduler out of Nova, rinse and repeat for Cinder etc. My approach is to move the scheduler out of Nova, and then let folk do whatever they need/want to do to the now separate scheduler - and refine the interface subsequently as well. I think both approaches are doable, and can even be done in parallel (just keep the code that moves synced). My concern about moving the data store /first/ is that that is secondary to the key goal of moving it out of tree, and we work on a fast time scale here :). -Rob On 23 November 2013 06:01, Khanh-Toan Tran khanh-toan.t...@cloudwatt.com wrote: Dear all, I'm very interested in this subject as well. Actually there is also a discussion of the possibility of an independent scheduler in the mailisg list: http://lists.openstack.org/pipermail/openstack-dev/2013-November/019518.html Would it be possible to discuss about this subject in the next Scheduler meeting Nov 26th? Best regards, Toan - Original Message - From: Mike Spreitzer mspre...@us.ibm.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Sent: Friday, November 22, 2013 4:58:46 PM Subject: Re: [openstack-dev] [Nova][Schduler] Volunteers wanted for a modest proposal for an external scheduler in our lifetime I'm still a newbie here, so can not claim my Nova skills are even modest. But I'd like to track this, if nothing more. Thanks, Mike ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Schduler] Volunteers wanted for a modest proposal for an external scheduler in our lifetime
Robert, Btw, I would like to be a volunteer too=) Best regards, Boris Pavlovic On Sun, Nov 24, 2013 at 10:43 PM, Robert Collins robe...@robertcollins.netwrote: On 22 November 2013 23:55, Gary Kotton gkot...@vmware.com wrote: I'm looking for 4-5 folk who have: - modest Nova skills - time to follow a fairly mechanical (but careful and detailed work needed) plan to break the status quo around scheduler extraction I would be happy to take part. But prior I think that we need to iron out a number of issues: Cool! Added your name to the list of volunteers, which brings us to 4, the minimum I wanted before starting things happening. 1. Will this be a new service that has an API, for example will Nova be able to register a host and provide the host statistics. This will be an RPC api initially, because we know the performance characteristics of the current RPC API, and doing anything different to that is unnecessary risk. Once the new structure is: * stable * gated with unit and tempest tests * with a straightforward and well documented migration path for deployers Then adding a RESTful API could take place. 2. How will the various components interact with the scheduler - same as today - that is RPC? Or a REST API? The latter is a real concern due to problems we have seen with the interactions of nova and other services RPC initially. REST *only* once we've avoided second system syndrome. 3. How will current developments fit into this model? Code sync - take a forklift copy of the code, and apply patches to both for the one cycle. All in all I think that it is a very good and healthy idea. I have a number of reservations - these are mainly regarding the implementation and the service definition. Basically I like the approach of just getting heads down and doing it, but prior to that I think that we just need to understand the scope and mainly define the interfaces and how they can used/abused and consumed. It may be a very good topic to discuss at the up and coming scheduler meeting - this may be in the middle of the night for Robert. If so then maybe we can schedule another time. Tuesdays at 1500 UTC - I'm in UTC+13 at the moment, so thats 0400 local. A ltle early for me :) I'll ping you on IRC about resolving the concerns you raise, and you can proxy my answers to the sub group meeting? Please note that this is scheduling and not orchestration. That is also something that we need to resolve. Yup, sure is. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Keystone][Oslo] Future of Key Distribution Server, Trusted Messaging
I hear a concerted effort to get this bootstrapped in Keystone. We can do this if it is the voice of the majority. If we do: Keep KDS configuration separate from the Keystone configuration: the fact that they both point to the same host and port is temporary. In fact, we should probably spin up a separate wsgi service/port inside Keystone for just the KDS. This is not hard to do, and will support splitting it off into its own service. +1 on spinning up a new service/wsgi KDS should not show up in the Service catalog. It is not an end user visible service and should not look like one to the rest of the world. I believe that KDS should be discoverable, but I agree that it is not an end user service, so I am unsure of the best approach wrt the catalog. The other concern is the library interfacing with KDS (I would assume this goes into keystoneclient? At least for the time being). Once we have it up and running, we can move it to its own service or hand off to Barbican when appropriate. Are people OK with the current API implementation? I didn;t see a lot of outside comment on the code review, and there were certainly some aspects of it that were unclear. I think the API is if not ready to go, very close (maybe a single cleanup revision). If we are going to do this lets get the spec done ASAP and get the code in right away so we can get traction on it. Icehouse milestones will be coming through fast. I think it is imminently possible to have this in the repo and running fairly quickly with concerted effort. The code might need minor tweaking to conform to the spec if it changes. But as I recall almost 100% of the back and forth at this point was does it belong in keystone. https://review.openstack.org/#/c/40692/ --Morgan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] python-novaclient: uses deprecated keyring.backend.$keyring
Hi Thomas, How pressing is this issue? I know there is work being done to unify token/auth implementation across the clients. I want to have an idea of the heat here so we can look at addressing this directly in novaclient if it can't wait until the unification work to come down the line. (Sent from mobile so I haven't been able to look up more specifics) Cheers, --Morgan On Sunday, November 24, 2013, Thomas Goirand wrote: Hi, Someone sent a bug report against the python-novaclient package: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=728470 Could someone take care of this? FYI, the patch attached to the bug report seems wrong, according to mitya57 in #debian-python (in OFTC), though the problem is real and needs to be addressed, and I don't have the time to investigate it myself right now. Cheers, Thomas Goirand (zigo) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org javascript:; http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Keystone][Oslo] Future of Key Distribution Server, Trusted Messaging
On Sun, Nov 24, 2013 at 1:52 PM, Morgan Fainberg m...@metacloud.com wrote: The other concern is the library interfacing with KDS (I would assume this goes into keystoneclient? At least for the time being). I would rather see the client get its own repo, too. We still need to do that with the middleware. dt -- Dean Troyer dtro...@gmail.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Solum] git-Integration working group
On 11/22/2013 09:51 PM, Adrian Otto wrote: Monty, On Nov 22, 2013, at 6:24 PM, Monty Taylor mord...@inaugust.com wrote: On 11/22/2013 05:37 PM, Krishna Raman wrote: Hello all, I would like to kickoff the Git integration discussion. Goal of this subgroup is to go through the git-integration blueprint [1] and break it up into smaller blueprints that we can execute on. We have to consider 2 workflows: 1) For Milestone 1, pull based git workflow where user uses a public git repository (possibly on github) to trigger the build 2) For later milestones, a push based workflow where the git repository is maintained by Solum Hi! Hi, thanks for chiming in here. I'm a little disappointed that we've decided to base the initial workflow on something that is not related to the world-class git-based developer tooling that the OpenStack project has already produced. We have a GIANT amount of tooling in this space, and it's all quite scalable. There is also the intent by 3 or 4 different groups to make it more re-usable/re-consumable, including thoughts in making sure that we can drive it from and have it consume heat. The initial work will be something pretty trivial. It's just a web hook on a git push. The workflow in this case is not customizable, and has basically no features. The intent is to iterate on this to make it much more compelling over time, soon after the minimum integration, we will put a real workflow system in place. We did discuss Zuul and Nodepool, and nobody had any objection to learning more about those. This might be a bit early in our roadmap to be pulling them in, but if there is an easy way to use them early in our development, we'd like to explore that. A web hook on a git push seems trivial, but is not. Something has to run the git push - and as you said, you're planning that thing to be github. That means that, out of the gate, solum will not work without deferring to a non-free service. Alternately, you could install one of the other git servers, such as gitlab - but now you're engineering that. Also, you need to have something receive the payload of the webhook. Now - work is unavoidable, and is a good thing! But if we can avoid work over in the corner that can't be reused elsewhere, that would be neat. What I suggest is that you start with zuul as the engine that does things. It's pretty good at it. AND - there is a todo list item to add a github web hook receiver to it. If you guys added that (and I'd be happy to point you in the right direction) then you'd have a thing out of the gate that would be pluggable, in that it could respond to both github and non-github type events, and you'd be giving back some functionality to the very-git-oriented tooling of the OpenStack project. The zuul work in question wants to be able to respond to both pull requests and new refs landing to a branch, so I'm pretty sure it's what you want. That's the trigger side. On the launcher side, one could imagine writing a heat launcher, or an oslo.messaging launcher. Currently, we use gearman for message communication with the backends that take the action based on the incoming events. Clearly that's not the right choice for an OpenStack service, but it's got modular triggers for a reason. :) Also, if you do that work, writing one that would respond to webhooks from bitbucket is probably an additional 10 minutes of work. Devdatta has created 2 blueprints for consideration: [2] [3] I have set up a doodle to poll for a /recurring/ meeting time for this subgroup: http://doodle.com/7wypkzqe9wep3d33#table (Timezone support is enabled) Currently the plan is to try G+ hangouts to run this meetings and scribe on #solum. This will limit us to a max of 10 participants. If we have more interest, we will need to see how to change the meetings. We have IRC meeting channels for meetings. They are logged - and they have the benefit that they do not require non-Open Source software to access. If you have them in IRC, the team from OpenStack who is already working on developer workflow around git can potentially participate. I don't mean to be negative, but if you want to be a PaaS for OpenStack, I would strongly consider not using G+ when we have IRC, and I would strongly suggest engaging with the Infra projects that already know how to do git-based workflow and action triggering. We just finished holding the Solum Community Design Workshop in San Francisco. We had both irc and G+ in addition to etherpad for shared notetaking. What we found is that that collaboration was faster and more effective when we used the G+ tool. The remote participants had a strong preference for it, and requested that we use it for the breakout meetings as well. The breakout design meetings will have a scribe who will transcribe the interaction in IRC so it will also be logged. I'm less concerned with logging of it than I am with my ability to participate.
Re: [openstack-dev] [Nova][Schduler] Volunteers wanted for a modest proposal for an external scheduler in our lifetime
On 25 November 2013 08:08, Boris Pavlovic bpavlo...@mirantis.com wrote: Robert, Btw, I would like to be a volunteer too=) Best regards, Boris Pavlovic Awesome, added to the etherpad! -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Glance] Support of v1 and v2 glance APIs in Nova
On Fri, 2013-11-22 at 20:07 -0600, Matt Riedemann wrote: On Friday, November 22, 2013 5:52:17 PM, Russell Bryant wrote: On 11/22/2013 06:01 PM, Christopher Yeoh wrote: On Sat, Nov 23, 2013 at 8:33 AM, Matt Riedemann mrie...@linux.vnet.ibm.com mailto:mrie...@linux.vnet.ibm.com wrote: ... 21:51:42 dolphm i just hope that the version discovery mechanism is smart enough to realize when it's handed a versioned endpoint, and happily run with that ... 21:52:00 dolphm (by calling that endpoint and doing proper discovery) ... 21:52:24 russellb dolphm: yeah, need to handle that gracefully ... Just one point here (and perhaps I'm misunderstanding what was meant), but if the catalog points to a versioned endpoint shouldn't we just use that version rather than trying to discover what other versions may be available. Although we'll have cases of it just being set to a versioned endpoint because thats how it has been done in the past I think we should be making the assumption that if we're pointed to a specific version, that is the one we should be using. Agreed, and I think that's what Dolph and I meant. That also covers the override case that was expressed a few different times in this thread, giving the admin the ability to pin his environment to the version he knows and trusts during, for example, upgrades, and then slowly transitioning to a newer API. The nice thing with that approach is it should keep config options with hard-coded versions out of nova.conf which is what was being proposed in the glance and cinder v2 blueprint patches. So the way we have this in keystone at least is that querying GET / will return all available API versions and querying /v2.0 for example is a similar result with just the v2 endpoint. So you can hard pin a version by using the versioned URL. I spoke to somebody the other day about the discovery process in services. The long term goal should be that the service catalog contains unversioned endpoints and that all clients should do discovery. For keystone the review has been underway for a while now: https://review.openstack.org/#/c/38414/ the basics of this should be able to be moved into OSLO for other projects if required. Jamie -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [TripleO] Summit session wrapup
I've now gone through and done the post summit cleanup of blueprints and migration of design docs into blueprints as appropriate. We had 50 odd blueprints, many of where were really not effective blueprints - they described single work items with little coordination need, were not changelog items, etc. I've marked those obsolete. Blueprints are not a discussion forum - they are a place that [some] discussions can be captured, but anything initially filed there will take some time before folk notice it - and the lack of a discussion mechanism makes it very hard to reach consensus there. Could TripleO interested folk please raise things here, on the dev list initially, and we'll move it to lower latency // higher bandwidth environments as needed? From the summit we had the following outcomes https://etherpad.openstack.org/p/icehouse-deployment-hardware-autodiscovery - needs to be done in ironic https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-modelling-infrastructure-sla-services - needs more discussion to tease concerns out - in particular I want us to get to a problem statement that Nova core folk understand :) https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-ha-production-configuration - this is ready for folk to act on at any point https://blueprints.launchpad.net/tripleo/+spec/tripleo-tuskar-deployment-scaling-topologies - this is ready for folk to act on - but it's fairly shallow, since most of the answer was 'discuss with heat' :) https://blueprints.launchpad.net/tripleo/+spec/tripleo-icehouse-scaling-design - this is ready for folk to act on; the main thing was gathering a bunch of data so we can make good decisions from here on out The stable branches decision has been documented in the wiki - all done. Cheers, Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Oslo] Improving oslo-incubator update.py
Just a thought but couldn't the changes of a module update be calculated by comparing the last commit dates of the source and target module? For instance, if module A's update patch for Nova was uploaded on date XX then we can filter out the changes from XX~present and print it out for the author to paste in the commit message when running update.py for module A. This way we might not need any changes to the openstack-modules.conf? On Sun, Nov 24, 2013 at 12:54 AM, Doug Hellmann doug.hellm...@dreamhost.com wrote: Thanks for the reminder, Sandy. https://bugs.launchpad.net/oslo/+bug/1254300 On Sat, Nov 23, 2013 at 9:39 AM, Sandy Walsh sandy.wa...@rackspace.comwrote: Seeing this thread reminded me: We need support in the update script for entry points in olso setup.cfg to make their way into the target project. So, if update is getting some love, please keep that in mind. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- *Intel SSG/STO/DCST/CIT* 880 Zixing Road, Zizhu Science Park, Minhang District, 200241, Shanghai, China +862161166500 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Infra] Support for PCI Passthrough
On 2013年11月25日 02:13, Robert Collins wrote: On 23 November 2013 08:43, Jeremy Stanley fu...@yuggoth.org wrote: On 2013-11-22 08:59:16 + (+), Tan, Lin wrote: [...] In the near term, your best bet is to run your own test infrastructure supporting the hardware features you require and report advisory results back on proposed changes: http://ci.openstack.org/third_party.html For a longer term solution, you may want to consult with the TripleO project with regards to their bare-metal test plan: https://wiki.openstack.org/wiki/TripleO/TripleOCloud I think using the donated resources to perform this sort of testing is an ideal example of the value the TripleO cloud can bring to OpenStack as a whole. I don't know if we have the necessary hardware (I'm fairly sure we have VT-d, but I'm not 100% sure we have anything setup for SR-IOV. If we do, then cool - please come and work with us to get that testing what you need. A key consideration will be whether you want checking or gating. For gating or infra run checking there need to be two regions (which the we should want checking and gating, we definetely should put effort on it, it seems a fairly straightforward solution for such testing. Yongli He(Pauli He) TripleO cloud is aiming at) and infra running the tests; for checking without infra running it the third-party system is a good mechanism (and that can be run from a single TripleO region too, in principle. -Rob ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Solum] git-Integration working group
I think I mentioned this before, but the working group has not made any binding decisions. We already identified Zuul and Nodepool as tools to learn more about. It's not clear to us yet if it makes sense for the simple use case. That will be one of the things we discuss. We do know that Solum requires multi-tenant capability from these tools. They were designed for single-tenant use. We have not yet scoped the effort required to add that. One thing that has could be valuable before holding the first meeting is a written briefing on these tools. Someone who understands them should help with this, or we should do some research. I will be requesting help from a small research team to put something together, so we can study up and arrive well informed. Volunteers welcome. -- Adrian Original message From: Clint Byrum Date:11/24/2013 10:00 AM (GMT-08:00) To: openstack-dev Subject: Re: [openstack-dev] [Solum] git-Integration working group Excerpts from Adrian Otto's message of 2013-11-22 18:51:16 -0800: Monty, On Nov 22, 2013, at 6:24 PM, Monty Taylor mord...@inaugust.com wrote: On 11/22/2013 05:37 PM, Krishna Raman wrote: Hello all, I would like to kickoff the Git integration discussion. Goal of this subgroup is to go through the git-integration blueprint [1] and break it up into smaller blueprints that we can execute on. We have to consider 2 workflows: 1) For Milestone 1, pull based git workflow where user uses a public git repository (possibly on github) to trigger the build 2) For later milestones, a push based workflow where the git repository is maintained by Solum Hi! Hi, thanks for chiming in here. I'm a little disappointed that we've decided to base the initial workflow on something that is not related to the world-class git-based developer tooling that the OpenStack project has already produced. We have a GIANT amount of tooling in this space, and it's all quite scalable. There is also the intent by 3 or 4 different groups to make it more re-usable/re-consumable, including thoughts in making sure that we can drive it from and have it consume heat. The initial work will be something pretty trivial. It's just a web hook on a git push. The workflow in this case is not customizable, and has basically no features. The intent is to iterate on this to make it much more compelling over time, soon after the minimum integration, we will put a real workflow system in place. We did discuss Zuul and Nodepool, and nobody had any objection to learning more about those. This might be a bit early in our roadmap to be pulling them in, but if there is an easy way to use them early in our development, we'd like to explore that. Zuul and nodepool are things to optimize large scale testing. git-review and gerrit, on the other hand, are the frontend that it sounds like this trivial just a push process would try to replace. I don't think it is wise to ignore the success of those two pieces of the OpenStack infrastructure. If what you're doing is only ever going to be a simple push, so be it. However, I doubt that it will remain so simple. Is there some reason _not_ to just consume these as-is? Devdatta has created 2 blueprints for consideration: [2] [3] I have set up a doodle to poll for a /recurring/ meeting time for this subgroup: http://doodle.com/7wypkzqe9wep3d33#table (Timezone support is enabled) Currently the plan is to try G+ hangouts to run this meetings and scribe on #solum. This will limit us to a max of 10 participants. If we have more interest, we will need to see how to change the meetings. We have IRC meeting channels for meetings. They are logged - and they have the benefit that they do not require non-Open Source software to access. If you have them in IRC, the team from OpenStack who is already working on developer workflow around git can potentially participate. I don't mean to be negative, but if you want to be a PaaS for OpenStack, I would strongly consider not using G+ when we have IRC, and I would strongly suggest engaging with the Infra projects that already know how to do git-based workflow and action triggering. We just finished holding the Solum Community Design Workshop in San Francisco. We had both irc and G+ in addition to etherpad for shared notetaking. What we found is that that collaboration was faster and more effective when we used the G+ tool. The remote participants had a strong preference for it, and requested that we use it for the breakout meetings as well. The breakout design meetings will have a scribe who will transcribe the interaction in IRC so it will also be logged. We struggled with this in Ubuntu as well. Ultimately, our fine friends at Google have created what seems to be one of the most intuitive distributed collaboration tools the world has ever seen. I think Monty is right, and that we should strive to use
[openstack-dev] [Glance] Regarding Glance's behaviour when updating an image ...
Hi All, Newbie stacker here ... I have a basic question regarding the indended behaviour of Glance's image update API: What is the indended behaviour of Glance when updating an already uploaded image file? The functional test indicates that the intended behaviour is to disallow such updates: glance/tests/v2/test_images.py:test_image_lifecycle:210 # Uploading duplicate data should be rejected with a 409 ... When I configure Glance to use the local filesystem backend I do indeed get a 409 conflict but when I configure Glance to use Swift as the backend the operation succeeds and the original image file is replaced. On a related note, when using the local filesystem backend though I get a 409 conflict, it leaves the image in the saving state - I think it shouldn't change the state of the image. There's a bug logged regarding this behaviour (bug 1241379) and I'm working on the fix. But in light of the above question perhaps I should file another bug regarding the Swift storage backend? -- Koo ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Multidomain User Ids
The #1 pain point I hear from people in the field is that they need to consume read only LDAP but have service users in something Keystone specific. We are close to having this, but we have not closed the loop. This was something that was Henry's to drive home to completion. Do we have a plan? Federation depends on this, I think, but this problem stands alone. Two Solutions: 1 always require domain ID along with the user id for role assignments. 2 provide some way to parse from the user ID what domain it is. I was thinking that we could do something along the lines of 2 where we provide domain specific user_id prefix for example, if there is just one ldpa service, and they wanted to prefix anyting out of ldap with ldap@, then an id would be prefix field from LDAP. And would be configured on a per domain basis. THis would be optional. The weakness is that itbe Log N to determine which Domain a user_id came from. A better approach would be to use a divider, like '@' and then prefix would be the key for a hashtable lookup. Since it is optional, domains could still be stored in SQL and user_ids could be uuids. One problem is if someone comes by later an must use email address as the userid, the @ would mess them up. So The default divider should be something URL safe but no likely to be part of a userid. I realize that it might be impossible to match this criterion. Actually, there might be other reasons to forbid @ signs from IDs, as they look like phishing attempts in URLs. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] RFC: Potential to increase min required libvirt version to 0.9.11 ?
On 2013-11-22 20:29:10 + (+), Jeremy Stanley wrote: [...] At the moment, we're still looking for confirmation that nova-compute no longer locks up with the latest libvirt in UCA (1.1.1). [...] I emulated a full (parallel) tempest-devstack-vm-full run on several fresh 12.04 LTS VMs with Ubuntu Cloud Archive enabled and confirmed python-libvirt 1.1.1 was pulled in. I'm consistently getting numerous tempest tests failing with server in error state messages on the console. Someone with more nova debugging experience should probably repeat the same experiment. -- Jeremy Stanley ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Mistral] Community meeting agenda - 11/25/2013
Hi, Today we will have another Mistral IRC Community Meeting at 16.00 UTC (#openstack-meeting). Here’s the agenda: Review last week's action items Discuss API spec Discuss DSL Discuss PoC design Open discussion You can also find this agenda and the links to the previous meetings at https://wiki.openstack.org/wiki/Meetings/MistralAgenda. Feel free to join us and discuss all the Mistral activities! Thanks. Renat Akhmerov @ Mirantis Inc. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron][IPv6] Meeting logs from the first IRC meeting
Hi, guys: I took the action item to submit the code for review based on our discussion during the first IRC meeting. Here is the link: https://review.openstack.org/#/c/58186/ The changes we proposed are based on the POC work my friend (Randy Tuttle) and I did back in June on Grizzly and most recent update on Havana in Oct. If you are interested in the whitepaper, please unicast us. Thanks! Shixiong On Nov 21, 2013, at 5:34 PM, Collins, Sean (Contractor) sean_colli...@cable.comcast.com wrote: Meeting minutes and the logs for the Neutron IPv6 meeting has been posted. We will not meet next week, due to the Thanksgiving holiday in the US. Our next meeting will be Thursday Dec 5th - 2100 UTC, where we will review the goals from this week's meeting and look to create actionable items for I-2. [1] https://wiki.openstack.org/wiki/Meetings/Neutron-IPv6-Subteam [2] http://eavesdrop.openstack.org/meetings/neutron_ipv6/2013/ -- Sean M. Collins ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Unwedging the gate
Hi All, TL;DR Last week the gate got wedged on nondeterministic failures. Unwedging the gate required drastic actions to fix bugs. Starting on November 15th, gate jobs have been getting progressively less stable with not enough attention given to fixing the issues, until we got to the point where the gate was almost fully wedged. No one bug caused this, it was a collection of bugs that got us here. The gate protects us from code that fails 100% of the time, but if a patch fails 10% of the time it can slip through. Add a few of these bugs together and we get the gate to a point where the gate is fully wedged and fixing it without circumventing the gate (something we never want to do) is very hard. It took just 2 new nondeterministic bugs to take us from a gate that mostly worked, to a gate that was almost fully wedged. Last week we found out Jeremy Stanley (fungi) was right when he said, nondeterministic failures breed more nondeterministic failures, because people are so used to having to reverify their patches to get them to merge that they are doing so even when it's their patch which is introducing a nondeterministic bug. Side note: This is not the first time we wedge the gate, the first time was around September 26th, right when we were cutting Havana release candidates. In response we wrote elastic-recheck ( http://status.openstack.org/elastic-recheck/) to better track what bugs we were seeing. Gate stability according to Graphite: http://paste.openstack.org/show/53765/ (they are huge because they encode entire queries, so including as a pastebin). After sending out an email to ask for help fixing the top known gate bugs ( http://lists.openstack.org/pipermail/openstack-dev/2013-November/019826.html), we had a few possible fixes. But with the gate wedged, the merge queue was 145 patches long and could take days to be processed. In the worst case, none of the patches merging, it would take about 1 hour per patch. So on November 20th we asked for a freeze on any non-critical bug fixes ( http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html ), and kicked everything out of the merge queue and put our possible bug fixes at the front. Even with these drastic measures it still took 26 hours to finally unwedge the gate. In 26 hours we got the check queue failure rate (always higher then the gate failure rate) down from around 87% failure to below 10% failure. And we still have many more bugs to track down and fix in order to improve gate stability. 8 Major bug fixes later, we have the gate back to a reasonable failure rate. But how did things get so bad? I'm glad you asked, here is a blow by blow account. The gate has not been completely stable for a very long time, and it only took two new bugs to wedge the gate. Starting with the list of bugs we identified via elastic-recheck, we fixed 4 bugs that have been in the gate for a few weeks already. - https://bugs.launchpad.net/bugs/1224001 test_network_basic_ops fails waiting for network to become available - https://review.openstack.org/57290 was the fix which depended on https://review.openstack.org/53188 and https://review.openstack.org/57475 . - This fixed a race condition where the IP address from DHCP was not received by the VM at the right time. Minimize polling on the agent is now defaulted to True, which should reduce the time needed for configuring an interface on br-int consistently. - https://bugs.launchpad.net/bugs/1252514 Swift returning errors when setup using devstack - Fix https://review.openstack.org/#/c/57373/ - There were a few swift related problems that were sorted out as well. Most had to do with tuning swift properly for its use as a glance backend in the gate, ensuring that timeout values were appropriate for the devstack test slaves (in - resource constrained environments, the swift default timeouts could be tripped frequently (logs showed the request would have finished successfully given enough time)). Swift also had a race-condition in how it constructed its sqlite3 - files for containers and accounts, where it was not retrying operations when the database was locked. - https://bugs.launchpad.net/swift/+bug/1243973 Simultaneous PUT requests for the same account... - Fix https://review.openstack.org/#/c/57019/ - This was not on our original list of bugs, but while in bug fix mode, we got this one fixed as well - https://bugs.launchpad.net/bugs/1251784 nova+neutron scheduling error: Connection to neutron failed: Maximum attempts reached - Fix https://review.openstack.org/#/c/57509/ - Uncovered on mailing list ( http://lists.openstack.org/pipermail/openstack-dev/2013-November/019906.html) - Nova had a very old version of oslo's local.py which is used for managing references to local variables in coroutines. The old version had a pretty significant bug that
Re: [openstack-dev] Unwedging the gate
I have a proposal - I think we should mark all recheck bugs critical, and the respective project PTLs should actively shop around amongst their contributors to get them fixed before other work: we should drive the known set of nondeterministic issues down to 0 and keep it there. -Rob On 25 November 2013 18:00, Joe Gordon joe.gord...@gmail.com wrote: Hi All, TL;DR Last week the gate got wedged on nondeterministic failures. Unwedging the gate required drastic actions to fix bugs. Starting on November 15th, gate jobs have been getting progressively less stable with not enough attention given to fixing the issues, until we got to the point where the gate was almost fully wedged. No one bug caused this, it was a collection of bugs that got us here. The gate protects us from code that fails 100% of the time, but if a patch fails 10% of the time it can slip through. Add a few of these bugs together and we get the gate to a point where the gate is fully wedged and fixing it without circumventing the gate (something we never want to do) is very hard. It took just 2 new nondeterministic bugs to take us from a gate that mostly worked, to a gate that was almost fully wedged. Last week we found out Jeremy Stanley (fungi) was right when he said, nondeterministic failures breed more nondeterministic failures, because people are so used to having to reverify their patches to get them to merge that they are doing so even when it's their patch which is introducing a nondeterministic bug. Side note: This is not the first time we wedge the gate, the first time was around September 26th, right when we were cutting Havana release candidates. In response we wrote elastic-recheck (http://status.openstack.org/elastic-recheck/) to better track what bugs we were seeing. Gate stability according to Graphite: http://paste.openstack.org/show/53765/ (they are huge because they encode entire queries, so including as a pastebin). After sending out an email to ask for help fixing the top known gate bugs (http://lists.openstack.org/pipermail/openstack-dev/2013-November/019826.html), we had a few possible fixes. But with the gate wedged, the merge queue was 145 patches long and could take days to be processed. In the worst case, none of the patches merging, it would take about 1 hour per patch. So on November 20th we asked for a freeze on any non-critical bug fixes ( http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html ), and kicked everything out of the merge queue and put our possible bug fixes at the front. Even with these drastic measures it still took 26 hours to finally unwedge the gate. In 26 hours we got the check queue failure rate (always higher then the gate failure rate) down from around 87% failure to below 10% failure. And we still have many more bugs to track down and fix in order to improve gate stability. 8 Major bug fixes later, we have the gate back to a reasonable failure rate. But how did things get so bad? I'm glad you asked, here is a blow by blow account. The gate has not been completely stable for a very long time, and it only took two new bugs to wedge the gate. Starting with the list of bugs we identified via elastic-recheck, we fixed 4 bugs that have been in the gate for a few weeks already. https://bugs.launchpad.net/bugs/1224001 test_network_basic_ops fails waiting for network to become available https://review.openstack.org/57290 was the fix which depended on https://review.openstack.org/53188 and https://review.openstack.org/57475. This fixed a race condition where the IP address from DHCP was not received by the VM at the right time. Minimize polling on the agent is now defaulted to True, which should reduce the time needed for configuring an interface on br-int consistently. https://bugs.launchpad.net/bugs/1252514 Swift returning errors when setup using devstack Fix https://review.openstack.org/#/c/57373/ There were a few swift related problems that were sorted out as well. Most had to do with tuning swift properly for its use as a glance backend in the gate, ensuring that timeout values were appropriate for the devstack test slaves (in resource constrained environments, the swift default timeouts could be tripped frequently (logs showed the request would have finished successfully given enough time)). Swift also had a race-condition in how it constructed its sqlite3 files for containers and accounts, where it was not retrying operations when the database was locked. https://bugs.launchpad.net/swift/+bug/1243973 Simultaneous PUT requests for the same account... Fix https://review.openstack.org/#/c/57019/ This was not on our original list of bugs, but while in bug fix mode, we got this one fixed as well https://bugs.launchpad.net/bugs/1251784 nova+neutron scheduling error: Connection to neutron failed: Maximum attempts reached Fix https://review.openstack.org/#/c/57509/ Uncovered
[openstack-dev] [Murano] Murano Release 0.3 Announcement
While we are working hard moving Murano towards Application Cataloghttps://wiki.openstack.org/wiki/Murano/ApplicationCatalog we continue to deliver our ongoing releases. Murano Team is happy to announce that the new stable version of Murano - v0.3 has been released. This release contains several features and improvement, where ability to manage Linux instances is major one. Murano was always capable to manage instances with any Operation System, but only now, after first moves towards our new missionhttps://wiki.openstack.org/wiki/Murano/ApplicationCatalogwe added full-featured support for managing Linux instances. With this release we are introducing direct support for Quantum, ability to mark images uploaded to Glance with Murano specific metadata and many other improvements. A full list of changes and another necessary information can be found in Release Notes https://wiki.openstack.org/wiki/Murano/ReleaseNotes_v0.3 on the project Wiki. -- Serg Melikyan, Senior Software Engineer at Mirantis, Inc. http://mirantis.com | smelik...@mirantis.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Solum] git-Integration working group
On 25 November 2013 16:04, Adrian Otto adrian.o...@rackspace.com wrote: I think I mentioned this before, but the working group has not made any binding decisions. We already identified Zuul and Nodepool as tools to learn more about. It's not clear to us yet if it makes sense for the simple use case. That will be one of the things we discuss. We do know that Solum requires multi-tenant capability from these tools. They were designed for single-tenant use. We have not yet scoped the effort required to add that. One thing that has could be valuable before holding the first meeting is a written briefing on these tools. Someone who understands them should help with this, or we should do some research. I will be requesting help from a small research team to put something together, so we can study up and arrive well informed. Volunteers welcome. I wrote these docs a few months back. http://ci.openstack.org/running-your-own.html We're going to be deploying these soonish within TripleO clouds for local software management/upgrading of said clouds - ideally we could do that by leveraging Solum. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Unwedging the gate
On Sun, Nov 24, 2013 at 9:58 PM, Robert Collins robe...@robertcollins.netwrote: I have a proposal - I think we should mark all recheck bugs critical, and the respective project PTLs should actively shop around amongst their contributors to get them fixed before other work: we should drive the known set of nondeterministic issues down to 0 and keep it there. Yes! In fact we are already working towards that. See http://lists.openstack.org/pipermail/openstack-dev/2013-November/020048.html -Rob On 25 November 2013 18:00, Joe Gordon joe.gord...@gmail.com wrote: Hi All, TL;DR Last week the gate got wedged on nondeterministic failures. Unwedging the gate required drastic actions to fix bugs. Starting on November 15th, gate jobs have been getting progressively less stable with not enough attention given to fixing the issues, until we got to the point where the gate was almost fully wedged. No one bug caused this, it was a collection of bugs that got us here. The gate protects us from code that fails 100% of the time, but if a patch fails 10% of the time it can slip through. Add a few of these bugs together and we get the gate to a point where the gate is fully wedged and fixing it without circumventing the gate (something we never want to do) is very hard. It took just 2 new nondeterministic bugs to take us from a gate that mostly worked, to a gate that was almost fully wedged. Last week we found out Jeremy Stanley (fungi) was right when he said, nondeterministic failures breed more nondeterministic failures, because people are so used to having to reverify their patches to get them to merge that they are doing so even when it's their patch which is introducing a nondeterministic bug. Side note: This is not the first time we wedge the gate, the first time was around September 26th, right when we were cutting Havana release candidates. In response we wrote elastic-recheck (http://status.openstack.org/elastic-recheck/) to better track what bugs we were seeing. Gate stability according to Graphite: http://paste.openstack.org/show/53765/ (they are huge because they encode entire queries, so including as a pastebin). After sending out an email to ask for help fixing the top known gate bugs ( http://lists.openstack.org/pipermail/openstack-dev/2013-November/019826.html ), we had a few possible fixes. But with the gate wedged, the merge queue was 145 patches long and could take days to be processed. In the worst case, none of the patches merging, it would take about 1 hour per patch. So on November 20th we asked for a freeze on any non-critical bug fixes ( http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html ), and kicked everything out of the merge queue and put our possible bug fixes at the front. Even with these drastic measures it still took 26 hours to finally unwedge the gate. In 26 hours we got the check queue failure rate (always higher then the gate failure rate) down from around 87% failure to below 10% failure. And we still have many more bugs to track down and fix in order to improve gate stability. 8 Major bug fixes later, we have the gate back to a reasonable failure rate. But how did things get so bad? I'm glad you asked, here is a blow by blow account. The gate has not been completely stable for a very long time, and it only took two new bugs to wedge the gate. Starting with the list of bugs we identified via elastic-recheck, we fixed 4 bugs that have been in the gate for a few weeks already. https://bugs.launchpad.net/bugs/1224001 test_network_basic_ops fails waiting for network to become available https://review.openstack.org/57290 was the fix which depended on https://review.openstack.org/53188 and https://review.openstack.org/57475. This fixed a race condition where the IP address from DHCP was not received by the VM at the right time. Minimize polling on the agent is now defaulted to True, which should reduce the time needed for configuring an interface on br-int consistently. https://bugs.launchpad.net/bugs/1252514 Swift returning errors when setup using devstack Fix https://review.openstack.org/#/c/57373/ There were a few swift related problems that were sorted out as well. Most had to do with tuning swift properly for its use as a glance backend in the gate, ensuring that timeout values were appropriate for the devstack test slaves (in resource constrained environments, the swift default timeouts could be tripped frequently (logs showed the request would have finished successfully given enough time)). Swift also had a race-condition in how it constructed its sqlite3 files for containers and accounts, where it was not retrying operations when the database was locked. https://bugs.launchpad.net/swift/+bug/1243973 Simultaneous PUT requests for the same
Re: [openstack-dev] Unwedging the gate
On 25 November 2013 19:25, Joe Gordon joe.gord...@gmail.com wrote: On Sun, Nov 24, 2013 at 9:58 PM, Robert Collins robe...@robertcollins.net wrote: I have a proposal - I think we should mark all recheck bugs critical, and the respective project PTLs should actively shop around amongst their contributors to get them fixed before other work: we should drive the known set of nondeterministic issues down to 0 and keep it there. Yes! In fact we are already working towards that. See http://lists.openstack.org/pipermail/openstack-dev/2013-November/020048.html Indeed I saw that thread - I think I'm proposing something slightly different, or perhaps 'gate blocking' needs clearing up. Which is - that once we have sufficient evidence to believe there is a nondeterministic bug in trunk, whether or not the gate is obviously suffering, we should consider it critical immediately. I don't think we need 24h action on such bugs at that stage - gate blocking zomg issues obviously do though! The goal here would be to drive the steady state of 'recheck needed' so low that most people never encounter it. And break the social pattern that has been building up. -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Schduler] Volunteers wanted for a modest proposal for an external scheduler in our lifetime
Hi Robert, I see you have enough volunteers. You can put me on the backup list in case somebody drops out or you need additional bodies. Regards ...Juerg From: Boris Pavlovic [mailto:bpavlo...@mirantis.com] Sent: Sunday, November 24, 2013 8:09 PM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Nova][Schduler] Volunteers wanted for a modest proposal for an external scheduler in our lifetime Robert, Btw, I would like to be a volunteer too=) Best regards, Boris Pavlovic On Sun, Nov 24, 2013 at 10:43 PM, Robert Collins robe...@robertcollins.netmailto:robe...@robertcollins.net wrote: On 22 November 2013 23:55, Gary Kotton gkot...@vmware.commailto:gkot...@vmware.com wrote: I'm looking for 4-5 folk who have: - modest Nova skills - time to follow a fairly mechanical (but careful and detailed work needed) plan to break the status quo around scheduler extraction I would be happy to take part. But prior I think that we need to iron out a number of issues: Cool! Added your name to the list of volunteers, which brings us to 4, the minimum I wanted before starting things happening. 1. Will this be a new service that has an API, for example will Nova be able to register a host and provide the host statistics. This will be an RPC api initially, because we know the performance characteristics of the current RPC API, and doing anything different to that is unnecessary risk. Once the new structure is: * stable * gated with unit and tempest tests * with a straightforward and well documented migration path for deployers Then adding a RESTful API could take place. 2. How will the various components interact with the scheduler - same as today - that is RPC? Or a REST API? The latter is a real concern due to problems we have seen with the interactions of nova and other services RPC initially. REST *only* once we've avoided second system syndrome. 3. How will current developments fit into this model? Code sync - take a forklift copy of the code, and apply patches to both for the one cycle. All in all I think that it is a very good and healthy idea. I have a number of reservations - these are mainly regarding the implementation and the service definition. Basically I like the approach of just getting heads down and doing it, but prior to that I think that we just need to understand the scope and mainly define the interfaces and how they can used/abused and consumed. It may be a very good topic to discuss at the up and coming scheduler meeting - this may be in the middle of the night for Robert. If so then maybe we can schedule another time. Tuesdays at 1500 UTC - I'm in UTC+13 at the moment, so thats 0400 local. A ltle early for me :) I'll ping you on IRC about resolving the concerns you raise, and you can proxy my answers to the sub group meeting? Please note that this is scheduling and not orchestration. That is also something that we need to resolve. Yup, sure is. -Rob -- Robert Collins rbtcoll...@hp.commailto:rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.orgmailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev