[openstack-dev] Taking a break..
Hey all, Just wanted to drop a quick note to say that I decided to leave Rackspace to pursue another opportunity. My last day was last Friday. I won’t have much time for OpenStack, but I’m going to continue to hang out in the channels. Having been involved in the project since day 1, I’m going to find it difficult to fully walk away. I really don’t know how much I’ll continue to stay involved. I am completely burned out on nova. However, I’d really like to see versioned objects broken out into oslo and Ironic synced with nova’s object advancements. So, if I work on anything, it’ll probably be related to that. Cells will be left in a lot of capable hands. I have shared some thoughts with people on how I think we can proceed to make it ‘the way’ in nova. I’m going to work on documenting some of this in an etherpad so the thoughts aren’t lost. Anyway, it’s been fun… the project has grown like crazy! Keep on trucking... And while I won’t be active much, don’t be afraid to ping me! - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Taking a break..
Thnx, everyone, for the nice comments. Replies to Dan below: On Oct 22, 2014, at 10:52 AM, Dan Smith d...@danplanet.com wrote: I won’t have much time for OpenStack, but I’m going to continue to hang out in the channels. Nope, sorry, veto. I'm the only Core in this project, so I'm sorry: You do not have -2 rights. :) Some options to explain your way out: 1. Oops, I forgot it wasn't April 2. I have a sick sense of humor; I'm getting help for it 3. I've come to my senses after a brief break from reality Seriously, I don't recall a gerrit review for this terrible plan... This could be arranged. #2 is certainly true (I'm only on step 1 of the 10 step program) even though this one is not a joke. :) Well, I for one am really sorry to see you go. I'd be lying if I said I hope that your next opportunity leaves you daydreaming about going back to OpenStack before too long. However, if not, good luck! At the moment, I'm looking forward to new frustrating problems to solve. We'll see what happens. :) Have fun in Paris. And remember this French: Où est la salle de bain? - Chris --Dan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Nominating Jay Pipes for nova-core
+1 On Jul 30, 2014, at 2:02 PM, Michael Still mi...@stillhq.com wrote: Greetings, I would like to nominate Jay Pipes for the nova-core team. Jay has been involved with nova for a long time now. He's previously been a nova core, as well as a glance core (and PTL). He's been around so long that there are probably other types of core status I have missed. Please respond with +1s or any concerns. References: https://review.openstack.org/#/q/owner:%22jay+pipes%22+status:open,n,z https://review.openstack.org/#/q/reviewer:%22jay+pipes%22,n,z http://stackalytics.com/?module=nova-groupuser_id=jaypipes As a reminder, we use the voting process outlined at https://wiki.openstack.org/wiki/Nova/CoreTeam to add members to our core team. Thanks, Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][qa] proposal for moving forward on cells/tempest testing
On Jul 14, 2014, at 10:44 AM, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: Today we only gate on exercises in devstack for cells testing coverage in the gate-devstack-dsvm-cells job. The cells tempest non-voting job was moving to the experimental queue here [1] since it doesn't work with a lot of the compute API tests. I think we all agreed to tar and feather comstud if he didn't get Tempest working (read: passing) with cells enabled in Juno. The first part of this is just figuring out where we sit with what's failing in Tempest (in the check-tempest-dsvm-cells-full job). I'd like to propose that we do the following to get the ball rolling: 1. Add an option to tempest.conf under the compute-feature-enabled section to toggle cells and then use that option to skip tests that we know will fail in cells, e.g. security group tests. I think I was told tempest could infer cells from devstack config or something? I dunno the right way to do this. But, I'm basically +1 to all 3 of these. I think we just skip the broken tests for now and iterate on unskipping things one by one. - Chris 2. Open bugs for all of the tests we're skipping so we can track closing those down, assuming they aren't already reported. [2] 3. Once the known failures are being skipped, we can move check-tempest-dsvm-cells-full out of the experimental queue. I'm not proposing that it'd be voting right away, I think we have to see it burn in for awhile first. With at least this plan we should be able to move forward on identifying issues and getting some idea for how much of Tempest doesn't work with cells and the effort involved in making it work. Thoughts? If there aren't any objections, I said I'd work on the qa-spec and can start doing the grunt-work of opening bugs and skipping tests. [1] https://review.openstack.org/#/c/87982/ [2] https://bugs.launchpad.net/nova/+bugs?field.tag=cells+ -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo] Asyncio and oslo.messaging
On Jul 7, 2014, at 11:11 AM, Angus Salkeld angus.salk...@rackspace.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/07/14 05:30, Mark McLoughlin wrote: Hey This is an attempt to summarize a really useful discussion that Victor, Flavio and I have been having today. At the bottom are some background links - basically what I have open in my browser right now thinking through all of this. We're attempting to take baby-steps towards moving completely from eventlet to asyncio/trollius. The thinking is for Ceilometer to be the first victim. Has this been widely agreed on? It seems to me like we are mixing two issues: Right. Does someone have a pointer to where this was decided? - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] should we have a stale data indication in nova list/show?
I don't think we should be flipping states for instances on a potentially downed compute. We definitely should not set an instance to ERROR. I think a time associated with the last power state check might be nice and be good enough. - Chris On Jun 24, 2014, at 5:17 PM, Joe Gordon joe.gord...@gmail.com wrote: On Tue, Jun 24, 2014 at 5:12 PM, Joe Gordon joe.gord...@gmail.com wrote: On Tue, Jun 24, 2014 at 4:16 PM, Ahmed RAHAL ara...@iweb.com wrote: Le 2014-06-24 17:38, Joe Gordon a écrit : On Jun 24, 2014 2:31 PM, Russell Bryant rbry...@redhat.com mailto:rbry...@redhat.com wrote: There be dragons here. Just because Nova doesn't see the node reporting in, doesn't mean the VMs aren't actually still running. I think this needs to be left to logic outside of Nova. For example, if your deployment monitoring really does think the host is down, you want to make sure it's *completely* dead before taking further action such as evacuating the host. You certainly don't want to risk having the VM running on two different hosts. This is just a business I don't think Nova should be getting in to. I agree nova shouldn't take any actions. But I don't think leaving an instance as 'active' is right either. I was thinking move instance to error state (maybe an unknown state would be more accurate) and let the user deal with it, versus just letting the user deal with everything. Since nova knows something *may* be wrong shouldn't we convey that to the user (I'm not 100% sure we should myself). I saw compute nodes going down, from a management perspective (say, nova-compute disappeared), but VMs were just fine. Reporting on the state may be misleading. The 'unknown' state would fit, but nothing lets us presume the VMs are non-functional or impacted. nothing lets us presume the opposite as well. We don't know if the instance is still up. As far as an operator is concerned, a compute node not responding is a reason enough to check the situation. To go further about other comments related to customer feedback, there are many reasons a customer may think his VM is down, so showing him a 'useful information' in some cases will only trigger more anxiety. Besides people will start hammering the API to check 'state' instead of using proper monitoring. But, state is already reported if the customer shuts down a VM, so ... Currently, compute nodes state reporting is done by the nova-compute process himself, reporting back with a time stamp to the database (through conductor if I recall well). It's more like a watchdog than a reporting system. For VMs (assuming we find it useful) the same kind of process could occur: nova-compute reporting back all states with time stamps for all VMs he hosts. This shall then be optional, as I already sense scaling/performance issues here (ceilometer anyone ?). Finally, assuming the customer had access to this 'unknown' state information, what would he be able to do with it ? Usually he has no lever to 'evacuate' or 'recover' the VM. All he could do is spawn another instance to replace the lost one. But only if the VM really is currently unavailable, an information he must get from other sources. If I was a user, and my instance went to an 'UNKNOWN' state, I would check if its still operating, and if not delete it and start another instance. The alternative is how things work today, if a nova-compute goes down we don't change any instance states, and the user is responsible for making sure there instance is still operating even if the instance is set to ACTIVE. So, I see how the state reporting could be a useful information, but am not sure that nova Status is the right place for it. Ahmed. in ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Nominating Ken'ichi Ohmichi for nova-core
+1 On Jun 13, 2014, at 3:40 PM, Michael Still mi...@stillhq.com wrote: Greetings, I would like to nominate Ken'ichi Ohmichi for the nova-core team. Ken'ichi has been involved with nova for a long time now. His reviews on API changes are excellent, and he's been part of the team that has driven the new API work we've seen in recent cycles forward. Ken'ichi has also been reviewing other parts of the code base, and I think his reviews are detailed and helpful. Please respond with +1s or any concerns. References: https://review.openstack.org/#/q/owner:ken1ohmichi%2540gmail.com+status:open,n,z https://review.openstack.org/#/q/reviewer:ken1ohmichi%2540gmail.com,n,z http://www.stackalytics.com/?module=nova-groupuser_id=oomichi As a reminder, we use the voting process outlined at https://wiki.openstack.org/wiki/Nova/CoreTeam to add members to our core team. Thanks, Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Proposal: remove the server groups feature
On Apr 25, 2014, at 2:15 PM, Jay Pipes jaypi...@gmail.com wrote: Hi Stackers, When recently digging in to the new server group v3 API extension introduced in Icehouse, I was struck with a bit of cognitive dissonance that I can't seem to shake. While I understand and support the idea behind the feature (affinity and anti-affinity scheduling hints), I can't help but feel the implementation is half-baked and results in a very awkward user experience. I agree with all you said about this. Proposal I propose to scrap the server groups API entirely and replace it with a simpler way to accomplish the same basic thing. Create two new options to nova boot: --near-tag TAG and --not-near-tag TAG The first would tell the scheduler to place the new VM near other VMs having a particular tag. The latter would tell the scheduler to place the new VM *not* near other VMs with a particular tag. What is a tag? Well, currently, since the Compute API doesn't have a concept of a single string tag, the tag could be a key=value pair that would be matched against the server extra properties. You can actually already achieve this behavior… although with a little more work. There’s the Affinty filter which allows you to specify a same_host/different_host scheduler hint where you explicitly specify the instance uuids you want… (the extra work is having to know the instance uuids). But yeah, I think this makes more sense to me. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Openstack] [nova] Havana - Icehouse upgrades with cells
On Apr 23, 2014, at 6:36 PM, Sam Morrison sorri...@gmail.com wrote: Yeah I’m not sure what’s going on, I removed my hacks and tried it using the conductor rpcapi service and got what I think is a recursive call in nova-conductor. Added more details to https://bugs.launchpad.net/nova/+bug/1308805 I’m thinking there maybe something missing in the stable/havana branch or else cells is doing something different when it comes to objects. I don’t think it is a cells issue though as debugging it, it seems like it just can’t back port a 1.13 object to 1.9. Cheers, Sam Oh. You know, it turns out that conductor API bug you found…was really not a real bug, I don’t think. The only thing that can backport is the conductor service, if the conductor service has been upgraded. Ie, ‘use_local’ would never ever work, because it was the local service that didn’t understand the new object version to begin with. So trying to use_local would still not understand the new version. Make sense? (This should probably be made to fail gracefully, however :) And yeah, I think what you have going on now when you’re actually using the conductor… is that conductor is getting a request to backport, but it doesn’t know how to backport…. so it’s kicking it to itself to backport.. and infinite recursion occurs. Do you happen to have use_local=False in your nova-conductor nova.conf? That would cause nova-conductor to RPC to itself to try to backport, hehe. Again, we should probably have some graceful failing here in some way. 1) nova-conductor should probably always force use_local=True. And the LocalAPI should probably just implement object_backport() such that it raises a nice error. So, does your nova-conductor not have object version 1.13? As I was trying to get at in a previous reply, I think the only way this can possibly work is that you have Icehouse nova-conductor running in ALL cells. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Openstack] [nova] Havana - Icehouse upgrades with cells
On Apr 24, 2014, at 6:10 AM, Sam Morrison sorri...@gmail.com wrote: Hmm I may have but I’ve just done another test with everything set to use_local=False except nova-conductor where use_local=True I also reverted that change I put though as mentioned above and I still get an infinite loop. Can’t really figure out what is going on here. Conductor is trying to talk to conductor and use_local definitely equals True. (this is all with havana conductor btw) Interesting So, does your nova-conductor not have object version 1.13? As I was trying to get at in a previous reply, I think the only way this can possibly work is that you have Icehouse nova-conductor running in ALL cells. OK so in my compute cell I am now running an Icehouse conductor. Everything else is Havana including the DB version. This actually seems to make all the things that didn’t work now work. However it also means that the thing that did work (booting an instance) no longer works. This is an easy fix and just requires nova-conductor to call the run_instance scheduler rpcapi method with version 2.9 as opposed the icehouse version 3.0. I don’t think anything has changed here so this might be an easy fix that could be pushed upstream. It just needs to change the scheduler rpcapi to be aware what version it can use. I changed the upgrade_levels scheduler=havana but that wasn’t handled by the scheduler rpcapi and just gave a version not new enough exception. I think I’m making progress….. Cool. So, what is tested upstream is upgrading everything except nova-compute. You could try upgrading nova-scheduler as well. Although, I didn’t think we had any build path going through conductor yet. Do you happen to have a traceback from that? (Curious what the call path looks like) - Chris Sam ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Oslo] [Ironic] Can we change rpc_thread_pool_size default value?
Fwiw, we've seen this with nova-scheduler as well. I think the default pool size is too large in general. The problem that I've seen stems from the fact that DB calls all block and you can easily get a stack of 64 workers all waiting to do DB calls. And it happens to work out such that none of the rpc pool threads return before all run their DB calls. This is compounded by the explicit yield we have for every DB call in nova. Anyway, this means that all of the workers are tied up for quite a while. Since nova casts to the scheduler, it doesn't impact the API much. But if you were waiting on an RPC response, you could be waiting a while. Ironic does a lot of RPC calls. I don't think we know the exact behavior in Ironic, but I'm assuming it's something similar. If all rpc pool threads are essentially stuck until roughly the same time, you end up with API hangs. But we're also seeing periodic task run delays as well. It must be getting stuck behind a lot of the rpc worker threads such that lowering the number of threads helps considerably. Given DB calls all block the process right now, there's really not much advantage to a larger pool size. 64 is too much, IMO. It would make more sense if there was more IO that could be parallelized. That didn't answer your question. I've been meaning to ask the same one since we discovered this. :) - Chris On Apr 22, 2014, at 3:54 PM, Devananda van der Veen devananda@gmail.com wrote: Hi! When a project is using oslo.messaging, how can we change our default rpc_thread_pool_size? --- Background Ironic has hit a bug where a flood of API requests can deplete the RPC worker pool on the other end and cause things to break in very bad ways. Apparently, nova-conductor hit something similar a while back too. There've been a few long discussions on IRC about it, tracked partially here: https://bugs.launchpad.net/ironic/+bug/1308680 tldr; a way we can fix this is to set the rpc_thread_pool_size very small (eg, 4) and keep our conductor.worker_pool size near its current value (eg, 64). I'd like these to be the default option values, rather than require every user to change the rpc_thread_pool_size in their local ironic.conf file. We're also about to switch from the RPC module in oslo-incubator to using the oslo.messaging library. Why are these related? Because it looks impossible for us to change the default for this option from within Ironic, because the option is registered when EventletExecutor is instantaited (rather than loaded). https://github.com/openstack/oslo.messaging/blob/master/oslo/messaging/_executors/impl_eventlet.py#L76 Thanks, Devananda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Openstack] [nova] Havana - Icehouse upgrades with cells
On Apr 19, 2014, at 11:08 PM, Sam Morrison sorri...@gmail.com wrote: Thanks for the info Chris, I’ve actually managed to get things working. Haven’t tested everything fully but seems to be working pretty good. On 19 Apr 2014, at 7:26 am, Chris Behrens cbehr...@codestud.com wrote: The problem here is that Havana is not going to know how to backport the Icehouse object, even if had the conductor methods to do so… unless you’re running the Icehouse conductor. But yes, your nova-computes would also need the code to understand to hit conductor to do the backport, which we must not have in Havana? OK this conductor api method was actually back ported to Havana, it kept it’s 1.62 version for the method but in Havana conductor manager it is set to 1.58. That is easily fixed but then it gets worse. I may be missing something but the object_backport method doesn’t work at all and looking at the signature never worked? I’ve raised a bug: https://bugs.launchpad.net/nova/+bug/1308805 (CCing openstack-dev and Dan Smith) That looked wrong to me as well, and then I talked with Dan Smith and he reminded me the RPC deserializer would turn that primitive into a an object on the conductor side. The primitive there is the full primitive we use to wrap the object with the versioning information, etc. Does your backport happen to not pass the full object primitive? Or maybe missing the object RPC deserializer on conductor? (I would think that would have to be set in Havana) nova/service.py would have: 194 serializer = objects_base.NovaObjectSerializer() 195 196 self.rpcserver = rpc.get_server(target, endpoints, serializer) 197 self.rpcserver.start() I’m guessing that’s there… so I would think maybe the object_backport call you have is not passing the full primitive. I don’t have the time to peak at your code on github right this second, but maybe later. :) - Chris This also means that if you don’t want your computes on Icehouse yet, you must actually be using nova-conductor and not use_local=True for it. (I saw the patch go up to fix the objects use of conductor API… so I’m guessing you must be using local right now?) Yeah we still haven’t moved to use conductor so if you also don’t use conductor you’ll need the simple fix at bug: https://bugs.launchpad.net/nova/+bug/1308811 So, I think an upgrade process could be: 1) Backport the ‘object backport’ code into Havana. 2) Set up *Icehouse* nova-conductor in your child cells and use_local=False on your nova-computes 3) Restart your nova-computes. 4) Update *all* nova-cells processes (in all cells) to Icehouse. You can keep use_local=False on these, but you’ll need that object conductor API patch. At this point you’d have all nova-cells and all nova-conductors on Icehouse and everything else on Havana. If the Havana computes are able to talk to the Icehouse conductors, they should be able to backport any newer object versions. Same with nova-cells receiving older objects from nova-api. It should be able to backport them. After this, you should be able to upgrade nova-api… and then probably upgrade your nova-computes on a cell-by-cell basis. I don’t *think* nova-scheduler is getting objects yet, especially if you’re somehow magically able to get builds to work in what you tested so far. :) But if it is, you may find that you need to insert an upgrade of your nova-schedulers to Icehouse between steps 3 and 4 above…or maybe just after #4… so that it can backport objects, also. I still doubt this will work 100%… but I dunno. :) And I could be missing something… but… I wonder if that makes sense? What I have is an Icehouse API cell and a Havana compute cell and havana compute nodes with the following changes: Change the method signature of attach_volume to match icehouse, the additional arguments are optional and don’t seem to break things if you ignore them. https://bugs.launchpad.net/nova/+bug/1308846 Needed a small fix for unlocking, there is a race condition that I have a fix for but haven’t pushed up. Then I hacked up a fix for object back porting. The code is at https://github.com/NeCTAR-RC/nova/commits/nectar/havana-icehouse-compat The last three commits are the fixes needed. I still need to push up the unlocking one and also a minor fix for metadata syncing with deleting and notifications. Would love to get the object back porting stuff fixed properly from someone who knows how all the object stuff works. Cheers, Sam ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] Should we adopt a blueprint design process
+1 On Apr 17, 2014, at 12:27 PM, Russell Haering russellhaer...@gmail.com wrote: Completely agree. We're spending too much time discussing features after they're implemented, which makes contribution more difficult for everyone. Forcing an explicit design+review process, using the same tools as we use for coding+review seems like a great idea. If it doesn't work we can iterate. On Thu, Apr 17, 2014 at 11:01 AM, Kyle Mestery mest...@noironetworks.com wrote: On Thu, Apr 17, 2014 at 12:11 PM, Devananda van der Veen devananda@gmail.com wrote: Hi all, The discussion of blueprint review has come up recently for several reasons, not the least of which is that I haven't yet reviewed many of the blueprints that have been filed recently. My biggest issue with launchpad blueprints is that they do not provide a usable interface for design iteration prior to writing code. Between the whiteboard section, wikis, and etherpads, we have muddled through a few designs (namely cinder and ceilometer integration) with accuracy, but the vast majority of BPs are basically reviewed after they're implemented. This seems to be a widespread objection to launchpad blueprints within the OpenStack community, which others are trying to solve. Having now looked at what Nova is doing with the nova-specs repo, and considering that TripleO is also moving to that format for blueprint submission, and considering that we have a very good review things in gerrit culture in the Ironic community already, I think it would be a very positive change. For reference, here is the Nova discussion thread: http://lists.openstack.org/pipermail/openstack-dev/2014-March/029232.html and the specs repo BP template: https://github.com/openstack/nova-specs/blob/master/specs/template.rst So, I would like us to begin using this development process over the course of Juno. We have a lot of BPs up right now that are light on details, and, rather than iterate on each of them in launchpad, I would like to propose that: * we create an ironic-specs repo, based on Nova's format, before the summit * I will begin reviewing BPs leading up to the summit, focusing on features that were originally targeted to Icehouse and didn't make it, or are obviously achievable for J1 * we'll probably discuss blueprints and milestones at the summit, and will probably adjust targets * after the summit, for any BP not targeted to J1, we require blueprint proposals to go through the spec review process before merging any associated code. Cores and interested parties, please reply to this thread with your opinions. I think this is a great idea Devananda. The Neutron community has moved to this model for Juno as well, and people have been very positive so far. Thanks, Kyle -- Devananda ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] oslo removal of use_tpool conf option
I’m going to try to not lose my cool here, but I’m extremely upset by this. In December, oslo apparently removed the code for ‘use_tpool’ which allows you to run DB calls in Threads because it was ‘eventlet specific’. I noticed this when a review was posted to nova to add the option within nova itself: https://review.openstack.org/#/c/59760/ I objected to this and asked (more demanded) for this to be added back into oslo. It was not. What I did not realize when I was reviewing this nova patch, was that nova had already synced oslo’s change. And now we’ve released Icehouse with a conf option missing that existed in Havana. Whatever projects were using oslo’s DB API code has had this option disappear (unless an alternative was merged). Maybe it’s only nova.. I don’t know. Some sort of process broke down here. nova uses oslo. And oslo removed something nova uses without deprecating or merging an alternative into nova first. How I believe this should have worked: 1) All projects using oslo’s DB API code should have merged an alternative first. 2) Remove code from oslo. 3) Then sync oslo. What do we do now? I guess we’ll have to back port the removed code into nova. I don’t know about other projects. NOTE: Very few people are probably using this, because it doesn’t work without a patched eventlet. However, Rackspace happens to be one that does. And anyone waiting on a new eventlet to be released such that they could use this with Icehouse is currently out of luck. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] oslo removal of use_tpool conf option
On Apr 17, 2014, at 4:26 PM, Joshua Harlow harlo...@yahoo-inc.com wrote: Just an honest question (no negativity intended I swear!). If a configuration option exists and only works with a patched eventlet why is that option an option to begin with? (I understand the reason for the patch, don't get me wrong). Right, it’s a valid question. This feature has existed one way or another in nova for quite a while. Initially the implementation in nova was wrong. I did not know that eventlet was also broken at the time, although I discovered it in the process of fixing nova’s code. I chose to leave the feature because it’s something that we absolutely need long term, unless you really want to live with DB calls blocking the whole process. I know I don’t. Unfortunately the bug in eventlet is out of our control. (I made an attempt at fixing it, but it’s not 100%. Eventlet folks currently have an alternative up that may or may not work… but certainly is not in a release yet.) We have an outstanding bug on our side to track this, also. The below is comparing apples/oranges for me. - Chris Most users would not be able to use such a configuration since they do not have this patched eventlet (I assume a newer version of eventlet someday in the future will have this patch integrated in it?) so although I understand the frustration around this I don't understand why it would be an option in the first place. An aside, if the only way to use this option is via a non-standard eventlet then how is this option tested in the community, aka outside of said company? An example: If yahoo has some patched kernel A that requires an XYZ config turned on in openstack and the only way to take advantage of kernel A is with XYZ config 'on', then it seems like that’s a yahoo only patch that is not testable and useable for others, even if patched kernel A is somewhere on github it's still imho not something that should be a option in the community (anyone can throw stuff up on github and then say I need XYZ config to use it). To me non-standard patches that require XYZ config in openstack shouldn't be part of the standard openstack, no matter the company. If patch A is in the mainline kernel (or other mainline library), then sure it's fair game. -Josh From: Chris Behrens cbehr...@codestud.com Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Date: Thursday, April 17, 2014 at 3:20 PM To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Subject: [openstack-dev] oslo removal of use_tpool conf option I’m going to try to not lose my cool here, but I’m extremely upset by this. In December, oslo apparently removed the code for ‘use_tpool’ which allows you to run DB calls in Threads because it was ‘eventlet specific’. I noticed this when a review was posted to nova to add the option within nova itself: https://review.openstack.org/#/c/59760/ I objected to this and asked (more demanded) for this to be added back into oslo. It was not. What I did not realize when I was reviewing this nova patch, was that nova had already synced oslo’s change. And now we’ve released Icehouse with a conf option missing that existed in Havana. Whatever projects were using oslo’s DB API code has had this option disappear (unless an alternative was merged). Maybe it’s only nova.. I don’t know. Some sort of process broke down here. nova uses oslo. And oslo removed something nova uses without deprecating or merging an alternative into nova first. How I believe this should have worked: 1) All projects using oslo’s DB API code should have merged an alternative first. 2) Remove code from oslo. 3) Then sync oslo. What do we do now? I guess we’ll have to back port the removed code into nova. I don’t know about other projects. NOTE: Very few people are probably using this, because it doesn’t work without a patched eventlet. However, Rackspace happens to be one that does. And anyone waiting on a new eventlet to be released such that they could use this with Icehouse is currently out of luck. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Thoughts from the PTL
On Apr 13, 2014, at 9:58 PM, Michael Still mi...@stillhq.com wrote: First off, thanks for electing me as the Nova PTL for Juno. I find the First off, congrats! * a mid cycle meetup. I think the Icehouse meetup was a great success, and I'd like to see us do this again in Juno. I'd also like to get the location and venue nailed down as early as possible, so that people who have complex travel approval processes have a chance to get travel sorted out. I think its pretty much a foregone conclusion this meetup will be somewhere in the continental US. If you're interested in hosting a meetup in approximately August, please mail me privately so we can chat. I think one of the outcomes from the first one was that we should try to do it earlier. Feature freeze would be somewhere around first week of September. I’d like to see us do it the last week of July at the latest, I think. That is still ‘approximately August’, I guess. :) Thoughts? - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Dropping or weakening the 'only import modules' style guideline - H302
On Apr 9, 2014, at 12:50 PM, Dan Smith d...@danplanet.com wrote: So I'm a soft -1 on dropping it from hacking. Me too. from testtools import matchers ... Or = matchers.Or LessThan = matchers.LessThan ... This is the right way to do it, IMHO, if you have something like matchers.Or that needs to be treated like part of the syntax. Otherwise, module-only imports massively improves the ability to find where something comes from. +1 My eyes bleed when I open up a python script and find 1 million imports for individual functions and classes. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Rolling upgrades in icehouse
On Mar 24, 2014, at 12:31 PM, Tim Bell tim.b...@cern.ch wrote: How does this interact with cells ? Can the cell API instances be upgraded independently of the cells themselves ? My ideal use case would be - It would be possible to upgrade one of the cells (such as a QA environment) before the cell API nodes - Cells can be upgraded one-by-one as needed by stability/functionality - API cells can be upgraded during this process ... i.e. mid way before the most critical cells are migrated Is this approach envisaged ? That would be my goal long term, but I’m not sure it’ll work right now. :) We did try to take care in making sure that the cells manager is backwards compatible. I think all messages going DOWN to the child cell from the API will work. However, what I could possibly see as broken is messages coming from a child cell back up to the API cell. I believe we changed instance updates to pass objects back up… The objects will fail to deserialize right now in the API cell, because it could get a newer version and not know how to deal with it. If we added support to make nova-cells always redirect via conductor, it could actually down-dev the object, but that has performance implications because of all of the DB updates the API nova-cells does. There are a number of things that I think cells doesn’t pass as objects yet, either, which could be a problem. So, in order words, I think the answer right now is there really is no great upgrade plan wrt cells other than just taking a hit and doing everything at once. I’d love to fix that, as I think it should work as you describe some day. We have work to do to make sure we’re actually passing objects everywhere.. and then need to think about how we can get the API cell to be able to deserialize newer object versions. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] nova-compute not re-establishing connectivity after controller switchover
Do you have some sort of network device like a firewall between your compute and rabbit or you failed from one rabbit over to another? The only cases where I've seen this happen is when the compute side OS doesn't detect a closed connection for various reasons. I'm on my phone and didn't check your logs, but thought I'd throw it out there. If the OS (linux) doesn't know the connection is dead, then obviously the user land software will not, either. You can netstat on both sides of the connection to see if something is out of whack. On Mar 24, 2014, at 10:40 AM, Chris Friesen chris.frie...@windriver.com wrote: On 03/24/2014 11:31 AM, Chris Friesen wrote: It looks like we're raising RecoverableConnectionError: connection already closed down in /usr/lib64/python2.7/site-packages/amqp/abstract_channel.py, but nothing handles it. It looks like the most likely place that should be handling it is nova.openstack.common.rpc.impl_kombu.Connection.ensure(). In the current oslo.messaging code the ensure() routine explicitly handles connection errors (which RecoverableConnectionError is) and socket timeouts--the ensure() routine in Havana doesn't do this. I misread the code, ensure() in Havana does in fact monitor socket timeouts, but it doesn't handle connection errors. It looks like support for handling connection errors was added to oslo.messaging just recently in git commit 0400cbf. The git commit comment talks about clustered rabbit nodes and mirrored queues which doesn't apply to our scenario, but I suspect it would probably fix the problem that we're seeing as well. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] An analysis of code review in Nova
I'd like to get spawn broken up sooner rather than later, personally. It has additional benefits of being able to do better orchestration of builds from conductor, etc. On Mar 14, 2014, at 3:58 PM, Dan Smith d...@danplanet.com wrote: Just to answer this point, despite the review latency, please don't be tempted to think one big change will get in quicker than a series of little, easy to review, changes. All changes are not equal. A large change often scares me away to easier to review patches. Seems like, for Juno-1, it would be worth cancelling all non-urgent bug fixes, and doing the refactoring we need. I think the aim here should be better (and easier to understand) unit test coverage. Thats a great way to drive good code structure. Review latency will be directly affected by how good the refactoring changes are staged. If they are small, on-topic and easy to validate, they will go quickly. They should be linearized unless there are some places where multiple sequences of changes make sense (i.e. refactoring a single file that results in no changes required to others). As John says, if it's just a big change everything patch, or a ton of smaller ones that don't fit a plan or process, then it will be slow and painful (for everyone). +1 sounds like a good first step is to move to oslo.vmware I'm not sure whether I think that refactoring spawn would be better done first or second. My gut tells me that doing spawn first would mean that we could more easily validate the oslo refactors because (a) spawn is impossible to follow right now and (b) refactoring it to smaller methods should be fairly easy. The tests for spawn are equally hard to follow and refactoring it first would yield a bunch of more unit-y tests that would help us follow the oslo refactoring. However, it sounds like the osloificastion has maybe already started and that refactoring spawn will have to take a backseat to that. --Dan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Backwards incompatible API changes
FWIW, I’m fine with any of the options posted. But I’m curious about the precedence that reverting would create. It essentially sounds like if we release a version with an API bug, the bug is no longer a bug in the API and the bug becomes a bug in the documentation. The only way to ‘fix' the API then would be to rev it. Is that an accurate representation and is that desirable? Or do we just say we take these on a case-by-case basis? - Chris On Mar 21, 2014, at 10:34 AM, David Kranz dkr...@redhat.com wrote: On 03/21/2014 05:04 AM, Christopher Yeoh wrote: On Thu, 20 Mar 2014 15:45:11 -0700 Dan Smith d...@danplanet.com wrote: I know that our primary delivery mechanism is releases right now, and so if we decide to revert before this gets into a release, that's cool. However, I think we need to be looking at CD as a very important use-case and I don't want to leave those folks out in the cold. I don't want to cause issues for the CD people, but perhaps it won't be too disruptive for them (some direct feedback would be handy). The initial backwards incompatible change did not result in any bug reports coming back to us at all. If there were lots of users using it I think we could have expected some complaints as they would have had to adapt their programs to no longer manually add the flavor access (otherwise that would fail). It is of course possible that new programs written in the meantime would rely on the new behaviour. I think (please correct me if I'm wrong) the public CD clouds don't expose that part of API to their users so the fallout could be quite limited. Some opinions from those who do CD for private clouds would be very useful. I'll send an email to openstack-operators asking what people there believe the impact would be but at the moment I'm thinking that revert is the way we should go. Could we consider a middle road? What if we made the extension silently tolerate an add-myself operation to a flavor, (potentially only) right after create? Yes, that's another change, but it means that old clients (like horizon) will continue to work, and new clients (which expect to automatically get access) will continue to work. We can document in the release notes that we made the change to match our docs, and that anyone that *depends* on the (admittedly weird) behavior of the old broken extension, where a user doesn't retain access to flavors they create, may need to tweak their client to remove themselves after create. My concern is that we'd be digging ourselves an even deeper hole with that approach. That for some reason we don't really understand at the moment, people have programs which rely on adding flavor access to a tenant which is already on the access list being rejected rather than silently accepted. And I'm not sure its the behavior from flavor access that we actually want. But we certainly don't want to end up in the situation of trying to work out how to rollback two backwards incompatible API changes. Chris Nope. IMO we should just accept that an incompatible change was made that should not have been, revert it, and move on. I hope that saying our code base is going to support CD does not mean that any incompatible change that slips through our very limited gate cannot be reverted. October was a while back but I'm not sure what principle we would use to draw the line. I am also not sure why this is phrased as a CD vs. not issue. Are the *users* of a system that happens to be managed using CD thought to be more tolerant of their code breaking? Perhaps it would be a good time to review https://wiki.openstack.org/wiki/Governance/Approved/APIStability and the details of https://wiki.openstack.org/wiki/APIChangeGuidelines to make sure they still reflect the will of the TC and our community. -David ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Constructive Conversations
On Mar 18, 2014, at 11:57 AM, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: […] Not to detract from what you're saying, but this is 'meh' to me. My company has some different kind of values thing every 6 months it seems and maybe it's just me but I never really pay attention to any of it. I think I have to put something on my annual goals/results about it, but it's just fluffy wording. To me this is a self-policing community, if someone is being a dick, the others should call them on it, or the PTL for the project should stand up against it and set the tone for the community and culture his project wants to have. That's been my experience at least. Maybe some people would find codifying this helpful, but there are already lots of wikis and things that people can't remember on a daily basis so adding another isn't probably going to help the problem. Bully's don't tend to care about codes, but if people stand up against them in public they should be outcast. I agree with the goals and sentiment of Kurt’s message. But, just to add a little to Matt’s reply: Let’s face it. Everyone has a bad day now and then. It’s easier for some people to lose their cool over others. Nothing’s going to change that. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Some thoughts on the nova-specs design process
On Mar 16, 2014, at 7:58 PM, Michael Still mi...@stillhq.com wrote: Hi. So I've written a blueprint for nova for Juno, and uploaded it to nova-specs (https://review.openstack.org/#/c/80865/). That got me thinking about what this process might look like, and this is what I came up with: * create a launchpad blueprint * you write a proposal in the nova-specs repo * add the blueprint to the commit message of the design proposal, and send the design proposal off for review * advertise the existence of the design proposal to relevant stake holders (other people who hack on that bit of the code, operators mailing list if relevant, etc) * when the proposal is approved, it merges into the nova-specs git repo and nova-drivers then mark the launchpad blueprint as approved * off you go with development as normal This has the advantage that there's always a launchpad blueprint, and that the spec review is associated with that blueprint. That way someone who finds the launchpad blueprint but wants to see the actual design proposal can easily do so because it is linked as an addressed by review on the blueprint. Thoughts? Makes sense to me. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] RFC - using Gerrit for Nova Blueprint review approval
On Mar 6, 2014, at 11:09 AM, Russell Bryant rbry...@redhat.com wrote: […] I think a dedicated git repo for this makes sense. openstack/nova-blueprints or something, or openstack/nova-proposals if we want to be a bit less tied to launchpad terminology. +1 to this whole idea.. and we definitely should have a dedicated repo for this. I’m indifferent to its name. :) Either one of those works for me. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Thought exercise for a V2 only world
On Mar 4, 2014, at 4:09 AM, Sean Dague s...@dague.net wrote: On 03/04/2014 01:14 AM, Chris Behrens wrote: […] I don’t think I have an answer, but I’m going to throw out some of my random thoughts about extensions in general. They might influence a longer term decision. But I’m also curious if I’m the only one that feels this way: I tend to feel like extensions should start outside of nova and any other code needed to support the extension should be implemented by using hooks in nova. The modules implementing the hook code should be shipped with the extension. If hooks don’t exist where needed, they should be created in trunk. I like hooks. Of course, there’s probably such a thing as too many hooks, so… hmm… :) Anyway, this addresses another annoyance of mine whereby code for extensions is mixed in all over the place. Is it really an extension if all of the supporting code is in ‘core nova’? That said, I then think that the only extensions shipped with nova are really ones we deem “optional core API components”. “optional” and “core” are probably oxymorons in this context, but I’m just going to go with it. There would be some sort of process by which we let extensions “graduate” into nova. Like I said, this is not really an answer. But if we had such a model, I wonder if it turns “deprecating extensions” into something more like “deprecating part of the API”… something less likely to happen. Extensions that aren’t used would more likely just never graduate into nova. So this approach actually really concerns me, because what it says is that we should be optimizing Nova for out of tree changes to the API which are vendor specific. Which I think is completely the wrong direction. Because in that world you'll never be able to move between Nova installations. What's worse is you'll get multiple people implementing the same feature out of tree, slightly differently. Right. And I have an internal conflict because I also tend to agree with what you’re saying. :) But I think that if we have API extensions at all, we have your issue of “never being able to move”. Well, maybe not “never”, because at least they’d be easy to “turn on” if they are in nova. But I think for the random API extension that only 1 person ever wants to enable, there’s your same problem. This is somewhat off-topic, but I just don’t want a ton of bloat in nova for something few people use. I 100% agree the current extensions approach is problematic. It's used as a way to circumvent the idea of a stable API (mostly with oh, it's an extension, we need this feature right now, and it's not part of core so we don't need to give the same guaruntees.) Yeah, totally.. that’s bad. So realistically I want to march us towards a place where we stop doing that. Nova out of the box should have all the knobs that anyone needs to build these kinds of features on top of. If not, we should fix that. It shouldn't be optional. Agree, although I’m not sure if I’m reading this correctly as it sounds like you want the knobs that you said above concern you. I want some sort of balance. There’s extensions I think absolutely should be part of nova as optional features… but I don’t want everything. :) - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Thought exercise for a V2 only world
On Mar 4, 2014, at 11:14 AM, Sean Dague s...@dague.net wrote: I want to give the knobs to the users. If we thought it was important enough to review and test in Nova, then we made a judgement call that people should have access to it. Oh, I see. But, I don’t agree, certainly not for every single knob. It’s less of an issue in the private cloud world, but when you start offering this as a service, not everything is appropriate to enable. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Thought exercise for a V2 only world
On Mar 3, 2014, at 9:23 PM, Joe Gordon joe.gord...@gmail.com wrote: Hi All, here's a case worth exploring in a v2 only world ... what about some extension we really think is dead and should go away? can we ever remove it? In the past we have said backwards compatibility means no we cannot remove any extensions, if we adopt the v2 only notion of backwards compatibility is this still true? I don’t think I have an answer, but I’m going to throw out some of my random thoughts about extensions in general. They might influence a longer term decision. But I’m also curious if I’m the only one that feels this way: I tend to feel like extensions should start outside of nova and any other code needed to support the extension should be implemented by using hooks in nova. The modules implementing the hook code should be shipped with the extension. If hooks don’t exist where needed, they should be created in trunk. I like hooks. Of course, there’s probably such a thing as too many hooks, so… hmm… :) Anyway, this addresses another annoyance of mine whereby code for extensions is mixed in all over the place. Is it really an extension if all of the supporting code is in ‘core nova’? That said, I then think that the only extensions shipped with nova are really ones we deem “optional core API components”. “optional” and “core” are probably oxymorons in this context, but I’m just going to go with it. There would be some sort of process by which we let extensions “graduate” into nova. Like I said, this is not really an answer. But if we had such a model, I wonder if it turns “deprecating extensions” into something more like “deprecating part of the API”… something less likely to happen. Extensions that aren’t used would more likely just never graduate into nova. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Future of the Nova API
This thread is many messages deep now and I’m busy with a conference this week, but I wanted to carry over my opinion from the other “v3 API in Icehouse” thread and add a little to it. Bumping versions is painful. v2 is going to need to live for “a long time” to create the least amount of pain. I would think that at least anyone running a decent sized Public Cloud would agree, if not anyone just running any sort of decent sized cloud. I don’t think there’s a compelling enough reason to deprecate v2 and cause havoc with what we currently have in v3. I’d like us to spend more time on the proposed “tasks” changes. And I think we need more time to figure out if we’re doing versioning in the correct way. If we’ve got it wrong, a v3 doesn’t fix the problem and we’ll just be causing more havoc with a v4. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Future of the Nova API
Again, just another quick response, but if we can find a way to merge v2 into the current v3 code, so that we don't have dual maintenance, that would be really nice. On Feb 26, 2014, at 5:15 PM, Christopher Yeoh cbky...@gmail.com wrote: On Wed, 26 Feb 2014 16:04:38 -0600 Chris Behrens cbehr...@codestud.com wrote: This thread is many messages deep now and I’m busy with a conference this week, but I wanted to carry over my opinion from the other “v3 API in Icehouse” thread and add a little to it. Bumping versions is painful. v2 is going to need to live for “a long time” to create the least amount of pain. I would think that at least anyone running a decent sized Public Cloud would agree, if not anyone just running any sort of decent sized cloud. I don’t think there’s a compelling enough reason to deprecate v2 and cause havoc with what we currently have in v3. I’d like us to spend more time on the proposed “tasks” changes. And I think we need more time to figure out if we’re doing versioning in the correct way. If we’ve got it wrong, a v3 doesn’t fix the problem and we’ll just be causing more havoc with a v4. So I guess I agree tasks is something we should develop further and that makes significant non backwards compatible changes to the API - which is the major reason why we delayed V3. And its really important that we get those changes right so we don't need a v4. However, keeping V3 experimental indefinitely doesn't actually remove the dual maintenance burden. The only way to do that is eventually remove either the V2 or V3 version or do the suggested backport. We've pretty well established that starting a fresh v3 API is a multi cycle effort. If we remove the V3 api code in Juno and then start working on a new major version bump at a later date at say L or M it'll be another multi cycle effort which I doubt would be feasible, especially with people knowing there is the real risk at the end that it'll just get thrown away. And the alternative of not removing V3 leaves the extra maintenance burden. So whilst I agree with making sure we get it right but I'm wondering exactly what you mean by taking more time to figure out what we're doing - is it removing the V3 API code and just coping with extra maintenance burden? Or removing it and then trying to do a big multi cycle effort again a few cycles down the track? Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] v3 API in Icehouse
+1. I'd like to leave it experimental as well. I think the task work is important to the future of nova-api and I'd like to make sure we're not rushing anything. We're going to need to live with old API versions for a long time, so it's important that we get it right. I'm also not convinced there's a compelling enough reason for one to move to v3 as it is. Extension versioning is important, but I'm not sure it can't be backported to v2 in the meantime. - Chris On Feb 19, 2014, at 9:36 AM, Russell Bryant rbry...@redhat.com wrote: Greetings, The v3 API effort has been going for a few release cycles now. As we approach the Icehouse release, we are faced with the following question: Is it time to mark v3 stable? My opinion is that I think we need to leave v3 marked as experimental for Icehouse. There are a number of reasons for this: 1) Discussions about the v2 and v3 APIs at the in-person Nova meetup last week made me come to the realization that v2 won't be going away *any* time soon. In some cases, users have long term API support expectations (perhaps based on experience with EC2). In the best case, we have to get all of the SDKs updated to the new API, and then get to the point where everyone is using a new enough version of all of these SDKs to use the new API. I don't think that's going to be quick. We really don't want to be in a situation where we're having to force any sort of migration to a new API. The new API should be compelling enough that everyone *wants* to migrate to it. If that's not the case, we haven't done our job. 2) There's actually quite a bit still left on the existing v3 todo list. We have some notes here: https://etherpad.openstack.org/p/NovaV3APIDoneCriteria One thing is nova-network support. Since nova-network is still not deprecated, we certainly can't deprecate the v2 API without nova-network support in v3. We removed it from v3 assuming nova-network would be deprecated in time. Another issue is that we discussed the tasks API as the big new API feature we would include in v3. Unfortunately, it's not going to be complete for Icehouse. It's possible we may have some initial parts merged, but it's much smaller scope than what we originally envisioned. Without this, I honestly worry that there's not quite enough compelling functionality yet to encourage a lot of people to migrate. 3) v3 has taken a lot more time and a lot more effort than anyone thought. This makes it even more important that we're not going to need a v4 any time soon. Due to various things still not quite wrapped up, I'm just not confident enough that what we have is something we all feel is Nova's API of the future. Let's all take some time to reflect on what has happened with v3 so far and what it means for how we should move forward. We can regroup for Juno. Finally, I would like to thank everyone who has helped with the effort so far. Many hours have been put in to code and reviews for this. I would like to specifically thank Christopher Yeoh for his work here. Chris has done an *enormous* amount of work on this and deserves credit for it. He has taken on a task much bigger than anyone anticipated. Thanks, Chris! -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Sent the first batch of invitations to Atlanta's Summit
On Jan 28, 2014, at 12:45 PM, Stefano Maffulli stef...@openstack.org wrote: A few minutes ago we sent the first batch of invites to people who contributed to any of the official OpenStack programs[1] from 00:00 UTC on April 4, 2014 (Grizzly release day) until present. Something tells me that this date is not correct? :) - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Asynchrounous programming: replace eventlet with asyncio
On Feb 7, 2014, at 8:21 AM, Jesse Noller jesse.nol...@rackspace.com wrote: It seems that baking concurrency models into the individual clients / services adds some opinionated choices that may not scale, or fit the needs of a large-scale deployment. This is one of the things looking at the client tools I’ve noticed - don’t dictate a concurrency backend, treat it as producer/consumer/message passing and you end up with something that can potentially scale out a lot more. I agree, and I think we should do this with our own clients. However, on the service side, there are a lot of 3rd party modules that would need the support as well. libvirt, xenapi, pyamqp, qpid, kombu (sits on pyamqp), etc, come to mind as the top possibilities. I was also going to change direction in this reply and say that we should back up and come up with a basic set of requirements. In this thread, I think I’ve only seen arguments against various technology choices without a clear list of our requirements. Since Chuck has posted in the meantime, I’m going to start (what I view) should be some of our requirements in reply to him. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Asynchrounous programming: replace eventlet with asyncio
I want to address some of Chuck’s post, but I think we should come up with a list of requirements. Replies to Chuck inline, and then some requirements below: On Feb 7, 2014, at 8:38 AM, Chuck Thier cth...@gmail.com wrote: Concurrency is hard, let's blame the tools! Any lib that we use in python is going to have a set of trade-offs. Looking at a couple of the options on the table: 1. Threads: Great! code doesn't have to change too much, but now that code *will* be preempted at any time, so now we have to worry about locking and we have even more race conditions that are difficult to debug. Yes. I mean, as was pointed out earlier in this thread, there are also some gotchas when using eventlet, but there are a lot of cases that you 100% know will not result in a context switch. We’ve been able to avoid locks for this reason. (Although I also feel like if there’s cases where locking would be necessary when using Threads, we should look at how we can re-factor to avoid them. It tends to mean we’re sharing too much globally.) Besides the locking issue, our current model of creating a million greenthreads would not work well if we simply converted them to Threads. Our processes are already using way too much memory as it is (a separate issue that needs investigation). This becomes even worse if we only support async by using worker processes, as was suggested and commented on earlier in this thread. 2. Asyncio: Explicit FTW! Except now that big list of dependencies has to also support the same form of explicit concurrency. This is a trade-off that twisted makes as well. Any library that might block has to have a separate library made for it. We could dig deeper, but hopefully you see what I mean. Changing tools may solve one problem, but at the same time introduce a different set of problems. Yeah, exactly what I was trying to point out last night in my quick reply before bed. :) This should really be amended to say ‘not monkey-patching’ instead of ‘asyncio’. I realized that as soon as I hit Send last night. An implementation that would monkey patch and use asyncio underneath doesn’t have this issue. I think the biggest issue with using Eventlet is that developers want to treat it like magic, and you can't do that. If you are monkey patching the world, then you are doing it wrong. How about we take a moment to learn how to use the tools we have effectively, rather than just blaming them. Many projects have managed to use Eventlet effectively (including some in Openstack). In general, I agree with the ‘monkey patching the world’ statement. Except that tests are exempt from that argument. ;) But it may be a necessary evil. Eventlet isn't perfect, but it has gotten us quite a ways. If you do choose to use another library, please make sure you are trading for the right set of problems. Which is what leads me to wanting us to get a list of our requirements before we make any decisions. 1) Socket/fifo/pipe I/O cannot block ‘other work’. 2) Currently executing code that has potential to block for long periods of time need the ability to easily yield for ‘other work’ to be done. This statement is general, but I’m thinking about file I/O here. For example, if a block of code needs to copy a large file, it needs to be able to yield now and then. 3) Semaphores/locks/etc cannot block ‘other work’ that is not trying to acquire the same lock. 4) OS calls such as ‘wait’ or ‘waitpid’ need to not block ‘other work’. 5) The solution needs to perform reasonably well. 6) The solution needs to be reasonably resource efficient. 7) The solution needs to fulfill the above requirements even when using 3rd party modules. 8) Clients and libraries that we produce need to support the above in a way that arbitrary implementations could be used. I’m debating whether File I/O in #2 should be combined with #1 such that #1 becomes ‘any I/O’. I might only be separating File I/O out by thinking about possible implementations. And I’ve probably missed something. Anyway, I have opinions on what does and doesn’t satisfy the above, but I’ll reply separately. :) - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Asynchrounous programming: replace eventlet with asyncio
On Feb 7, 2014, at 2:59 PM, Victor Stinner victor.stin...@enovance.com wrote: I don't see why external libraries should be modified. Only the few libraries sending HTTP queries and requests to the database should handle asyncio. Dummy example: the iso8601 module used to parse time doesn't need to be aware of asyncio. When talking to libvirt, we don't want to block. When we're waiting on rabbit or qpid, we don't want to block. When we talk to XenAPI, we don't want to block. These are all 3rd party modules. We'd have to convert these all to work via a Thread pool, or we would have to monkey patch them like we do today. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Asynchrounous programming: replace eventlet with asyncio
On Feb 6, 2014, at 11:07 PM, Joshua Harlow harlo...@yahoo-inc.com wrote: +1 To give an example as to why eventlet implicit monkey patch the world isn't especially great (although it's what we are currently using throughout openstack). The way I think about how it works is to think about what libraries that a single piece of code calls and how it is very hard to predict whether that code will trigger a implicit switch (conceptually similar to a context switch). Conversely, switching to asyncio means that every single module call that would have blocked before monkey patching… will now block. What is worse? :) - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [keystone][nova] Re: Hierarchicical Multitenancy Discussion
Hi Vish, I’m jumping in slightly late on this, but I also have an interest in this. I’m going to preface this by saying that I have not read this whole thread yet, so I apologize if I repeat things, say anything that is addressed by previous posts, or doesn’t jive with what you’re looking for. :) But what you describe below sounds like exactly a use case I’d come up with. Essentially I want another level above project_id. Depending on the exact use case, you could name it ‘wholesale_id’ or ‘reseller_id’...and yeah, ‘org_id’ fits in with your example. :) I think that I had decided I’d call it ‘domain’ to be more generic, especially after seeing keystone had a domain concept. Your idea below (prefixing the project_id) is exactly one way I thought of doing this to be least intrusive. I, however, thought that this would not be efficient. So, I was thinking about proposing that we add ‘domain’ to all of our models. But that limits your hierarchy and I don’t necessarily like that. :) So I think that if the queries are truly indexed as you say below, you have a pretty good approach. The one issue that comes into mind is that if there’s any chance of collision. For example, if project ids (or orgs) could contain a ‘.’, then ‘.’ as a delimiter won’t work. My requirements could be summed up pretty well by thinking of this as ‘virtual clouds within a cloud’. Deploy a single cloud infrastructure that could look like many multiple clouds. ‘domain’ would be the key into each different virtual cloud. Accessing one virtual cloud doesn’t reveal any details about another virtual cloud. What this means is: 1) domain ‘a’ cannot see instances (or resources in general) in domain ‘b’. It doesn’t matter if domain ‘a’ and domain ‘b’ share the same tenant ID. If you act with the API on behalf of domain ‘a’, you cannot see your instances in domain ‘b’. 2) Flavors per domain. domain ‘a’ can have different flavors than domain ‘b’. 3) Images per domain. domain ‘a’ could see different images than domain ‘b’. 4) Quotas and quota limits per domain. your instances in domain ‘a’ don’t count against quotas in domain ‘b’. 5) Go as far as using different config values depending on what domain you’re using. This one is fun. :) etc. I’m not sure if you were looking to go that far or not. :) But I think that our ideas are close enough, if not exact, that we can achieve both of our goals with the same implementation. I’d love to be involved with this. I am not sure that I currently have the time to help with implementation, however. - Chris On Feb 3, 2014, at 1:58 PM, Vishvananda Ishaya vishvana...@gmail.com wrote: Hello Again! At the meeting last week we discussed some options around getting true multitenancy in nova. The use case that we are trying to support can be described as follows: Martha, the owner of ProductionIT provides it services to multiple Enterprise clients. She would like to offer cloud services to Joe at WidgetMaster, and Sam at SuperDevShop. Joe is a Development Manager for WidgetMaster and he has multiple QA and Development teams with many users. Joe needs the ability create users, projects, and quotas, as well as the ability to list and delete resources across WidgetMaster. Martha needs to be able to set the quotas for both WidgetMaster and SuperDevShop; manage users, projects, and objects across the entire system; and set quotas for the client companies as a whole. She also needs to ensure that Joe can't see or mess with anything owned by Sam. As per the plan I outlined in the meeting I have implemented a Proof-of-Concept that would allow me to see what changes were required in nova to get scoped tenancy working. I used a simple approach of faking out heirarchy by prepending the id of the larger scope to the id of the smaller scope. Keystone uses uuids internally, but for ease of explanation I will pretend like it is using the name. I think we can all agree that ‘orga.projecta’ is more readable than ‘b04f9ea01a9944ac903526885a2666dec45674c5c2c6463dad3c0cb9d7b8a6d8’. The code basically creates the following five projects: orga orga.projecta orga.projectb orgb orgb.projecta I then modified nova to replace everywhere where it searches or limits policy by project_id to do a prefix match. This means that someone using project ‘orga’ should be able to list/delete instances in orga, orga.projecta, and orga.projectb. You can find the code here: https://github.com/vishvananda/devstack/commit/10f727ce39ef4275b613201ae1ec7655bd79dd5f https://github.com/vishvananda/nova/commit/ae4de19560b0a3718efaffb6c205c7a3c372412f Keeping in mind that this is a prototype, but I’m hoping to come to some kind of consensus as to whether this is a reasonable approach. I’ve compiled a list of pros and cons. Pros: * Very easy to understand * Minimal changes to nova * Good performance in db (prefix matching uses indexes)
Re: [openstack-dev] [Nova][Scheduler] Will the Scheuler use Nova Objects?
On Jan 30, 2014, at 5:55 AM, Andrew Laski andrew.la...@rackspace.com wrote: I'm of the opinion that the scheduler should use objects, for all the reasons that Nova uses objects, but that they should not be Nova objects. Ultimately what the scheduler needs is a concept of capacity, allocations, and locality of resources. But the way those are modeled doesn't need to be tied to how Nova does it, and once the scope expands to include Cinder it may quickly turn out to be limiting to hold onto Nova objects. +2! ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [keystone][nova] Re: Hierarchicical Multitenancy Discussion
On Feb 5, 2014, at 9:13 AM, Tiwari, Arvind arvind.tiw...@hp.com wrote: Hi Chris, Looking at your requirements, seems my solution (see attached email) is pretty much aligned. What I am trying to propose is 1. One root domain as owner of virtual cloud. Logically linked to n leaf domains. 2. All leaf domains falls under admin boundary of virtual cloud owner. 3. No sharing of resources at project level, that will keep the authorization model simple. 4. No sharing of resources at domain level either. 5. Hierarchy or admin boundary will be totally governed by roles. This way we can setup a true virtual cloud/Reseller/wholesale model. Thoughts? Yeah, sounds the same, although we should clarify what 'resources' means (I used the term without completely clarifying it as well :). For example, a physical host is a resource, but I fully intend for it to be shared in that it will run VMs for multiple domains. So, by resources, I mean things like instances, images, networks, although I would also want the flexibility to be able to share images/networks between domains. Here's my larger thought process which led me to these features/requirements: Within a large company, you will find that you need to provide many discrete clouds to different organizations within the company. Each organization potentially has different requirements when it comes to flavors, images, networks, and even config options. The only current option is to setup 'x' completely separate openstack installs. This can be completely cost ineffective. Instead of doing this, I want to build 1 big cloud. The benefits are: 1) You don't have 'x' groups maintaining 'y' platforms. This results in saving time and saving money on people. 2) Creating a new cloud for a new organization takes seconds. 3) You can have a huge cost savings on hardware as it is all shared. and so forth. And yes, this exact same model is what Service Providers should want if they intend to Resell/Co-brand, etc. - Chris Thanks, Arvind -Original Message- From: Chris Behrens [mailto:cbehr...@codestud.com] Sent: Wednesday, February 05, 2014 1:27 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [keystone][nova] Re: Hierarchicical Multitenancy Discussion Hi Vish, I'm jumping in slightly late on this, but I also have an interest in this. I'm going to preface this by saying that I have not read this whole thread yet, so I apologize if I repeat things, say anything that is addressed by previous posts, or doesn't jive with what you're looking for. :) But what you describe below sounds like exactly a use case I'd come up with. Essentially I want another level above project_id. Depending on the exact use case, you could name it 'wholesale_id' or 'reseller_id'...and yeah, 'org_id' fits in with your example. :) I think that I had decided I'd call it 'domain' to be more generic, especially after seeing keystone had a domain concept. Your idea below (prefixing the project_id) is exactly one way I thought of doing this to be least intrusive. I, however, thought that this would not be efficient. So, I was thinking about proposing that we add 'domain' to all of our models. But that limits your hierarchy and I don't necessarily like that. :) So I think that if the queries are truly indexed as you say below, you have a pretty good approach. The one issue that comes into mind is that if there's any chance of collision. For example, if project ids (or orgs) could contain a '.', then '.' as a delimiter won't work. My requirements could be summed up pretty well by thinking of this as 'virtual clouds within a cloud'. Deploy a single cloud infrastructure that could look like many multiple clouds. 'domain' would be the key into each different virtual cloud. Accessing one virtual cloud doesn't reveal any details about another virtual cloud. What this means is: 1) domain 'a' cannot see instances (or resources in general) in domain 'b'. It doesn't matter if domain 'a' and domain 'b' share the same tenant ID. If you act with the API on behalf of domain 'a', you cannot see your instances in domain 'b'. 2) Flavors per domain. domain 'a' can have different flavors than domain 'b'. 3) Images per domain. domain 'a' could see different images than domain 'b'. 4) Quotas and quota limits per domain. your instances in domain 'a' don't count against quotas in domain 'b'. 5) Go as far as using different config values depending on what domain you're using. This one is fun. :) etc. I'm not sure if you were looking to go that far or not. :) But I think that our ideas are close enough, if not exact, that we can achieve both of our goals with the same implementation. I'd love to be involved with this. I am not sure that I currently have the time to help with implementation, however. - Chris On Feb 3, 2014, at 1:58 PM
Re: [openstack-dev] [keystone][nova] Re: Hierarchicical Multitenancy Discussion
On Feb 5, 2014, at 3:38 AM, Vishvananda Ishaya vishvana...@gmail.com wrote: On Feb 5, 2014, at 12:27 AM, Chris Behrens cbehr...@codestud.com wrote: 1) domain ‘a’ cannot see instances (or resources in general) in domain ‘b’. It doesn’t matter if domain ‘a’ and domain ‘b’ share the same tenant ID. If you act with the API on behalf of domain ‘a’, you cannot see your instances in domain ‘b’. 2) Flavors per domain. domain ‘a’ can have different flavors than domain ‘b’. I hadn’t thought of this one, but we do have per-project flavors so I think this could work in a project hierarchy world. We might have to rethink the idea of global flavors and just stick them in the top-level project. That way the flavors could be removed. The flavor list would have to be composed by matching all parent projects. It might make sense to have an option for flavors to be “hidden in sub projects somehow as well. In other words if orgb wants to delete a flavor from the global list they could do it by hiding the flavor. Definitely some things to be thought about here. Yeah, it's completely do-able in some way. The per-project flavors is a good start. 3) Images per domain. domain ‘a’ could see different images than domain ‘b’. Yes this would require similar hierarchical support in glance. Yup :) 4) Quotas and quota limits per domain. your instances in domain ‘a’ don’t count against quotas in domain ‘b’. Yes we’ve talked about quotas for sure. This is definitely needed. Also: not really related to this, but if we're making considerable quota changes, I would also like to see the option for separate quotas _per flavor_, even. :) 5) Go as far as using different config values depending on what domain you’re using. This one is fun. :) Curious for some examples here. With the idea that I want to be able to provide multiple virtual clouds within 1 big cloud, these virtual clouds may desire different config options. I'll pick one that could make sense: # When set, compute API will consider duplicate hostnames # invalid within the specified scope, regardless of case. # Should be empty, project or global. (string value) #osapi_compute_unique_server_name_scope= This is the first one that popped into my mind for some reason, and it turns out that this is actually a more complicated example than I was originally intending. I left it here, because there might be a potential issue with this config option when using 'org.tenant' as project_id. Ignoring that, let's say this config option had a way to say I don't want duplicate hostnames within my organization at all, I don't want any single tenant in my organization to have duplicate hostnames, or I don't care at all about duplicate hostnames. Ideally each organization could have its own config for this. volved with this. I am not sure that I currently have the time to help with implementation, however. Come to the meeting on friday! 1600 UTC I meant to hit the first one. :-/ I'll try to hit it this week. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Asynchrounous programming: replace eventlet with asyncio
Hi, Interesting thread. I have been working on a side project that is a gevent/eventlet replacement [1] that focuses on thread-safety and performance. This came about because of an outstanding bug we have with eventlet not being Thread safe. (We cannot safely enable thread pooling for DB calls so that they will not block.) Unfortunately, I tried to fix the issue while maintaining similar performance but haven’t been completely successful. This led me to believe that it was reasonable to work on an alternative micro-thread implementation on top of greenlet. So, I admit that this might be somewhat of a biased opinion [2], but I think that using a micro-thread implementation is useful. If not for any other reason, the resulting code is very clean and easy to read. It allows you to write code ‘the normal way’. If you have any sort of experience with real threading, it’s really easy to understand. Regardless of direction, I would like to see an oslo abstraction so that we can easily switch out the underlying implementation, potentially even making the choice a config option. I think that means that even if we move to asyncio, our abstraction layer provides something that looks like microthreads. I think that it’s maybe the only common ground that makes sense, and it addresses my concerns above regarding readability and ease of use. - Chris [1] I haven’t made the code public yet, but will shortly. Mostly I was concerned that it looked like a pile of garbage. :) But it’s at a point that this isn’t a concern anymore. [2] I really don’t care if my side project is used w/ OpenStack or not, despite thinking we’d do so. It will have usefulness to others outside of OpenStack, even if only for the 80-90% gains in performance that it seems to have compared to eventlet. Most importantly, it has just been fun. On Feb 4, 2014, at 12:38 PM, victor stinner victor.stin...@enovance.com wrote: Kevin Conway wrote: Switching our async IO management from eventlet to asyncio would not be a trivial task. Tell me when I'm wrong, but it would require that we completely change our programming model from typical, function-call based programming to use generator-iterators for everything. My proposition is to put asyncio on top of greenlet using the greenio project. So the current code can be leaved unchanged (it will continue to eventlet) if you don't want to modify it. New code may use asyncio API instead of greenlet/eventlet API, but the code will still be executed by greenlet. Or you may have different implementations of the same feature, one for eventlet and another for asyncio. For example, the Oslo Messaging project has an abstraction of the asynchronous framework called executor. So you can use a blocking executor, eventlet, trollius or something else. Today, a patch was proposed by Joshua Harlow (*) to support concurrent.futures to use a pool of thread. I don't know yet how asyncio can be integrated in other projects. I'm just starting with Oslo Messaging :-) The abstraction layer may be moved from Oslo Messaging to Oslo Incubator, so other projects can reuse it. (*) Start adding a futures executor based executor, https://review.openstack.org/#/c/70914/ Victor ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] [Ironic] mid-cycle meetup?
I’d be interested in this. While I have not provided any contributions to Ironic thus far, I’m beginning to look at it for some things. I am local to the bay area, so Sunnyvale is a convenient location for me as well. :) - Chris On Jan 24, 2014, at 5:30 PM, Devananda van der Veen devananda@gmail.com wrote: On Fri, Jan 24, 2014 at 2:03 PM, Robert Collins robe...@robertcollins.net wrote: This was meant to go to -dev, not -operators. Doh. -- Forwarded message -- From: Robert Collins robe...@robertcollins.net Date: 24 January 2014 08:47 Subject: [TripleO] mid-cycle meetup? To: openstack-operat...@lists.openstack.org openstack-operat...@lists.openstack.org Hi, sorry for proposing this at *cough* the mid-way point [christmas shutdown got in the way of internal acks...], but who would come if there was a mid-cycle meetup? I'm thinking the HP sunnyvale office as a venue. -Rob Hi! I'd like to co-locate the Ironic midcycle meetup, as there's a lot of overlap between our team's needs and facilitating that collaboration will be good. I've added the [Ironic] tag to the subject to pull in folks who may be filtering on this project specifically. Please keep us in the loop! Sunnyvale is easy for me, so I'll definitely be there. Cheers, Deva ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Cells] compute api and objects
On Dec 9, 2013, at 2:58 PM, Sam Morrison sorri...@gmail.com wrote: Hi, I’m trying to fix up some cells issues related to objects. Do all compute api methods take objects now? cells is still sending DB objects for most methods (except start and stop) and I know there are more than that. Eg. I know lock/unlock, shelve/unshelve take objects, I assume there are others if not all methods now? I don't think all of them do. As the compute API methods were changing, we were changing the cells code at the same time to not use the generic 'call_compute_api_method' RPC call. It's possible some got missed, however. And in fact, it does look like this is the case. The shelve calls appear to be example of where things were converted, but the cells code was forgotten. :-/ We'll want to implement new RPC calls in nova/cells/rpcapi that are compatible with the compute_rpcapi calls that are normally used. And then add the appropriate code in nova/cells/manager.py and nova/cells/messaging.py. I can help fix this all up. I guess we'll want to find and file bugs for all of these. It appears you've got a bug filed for unlock… (lock would also be broken, I would think). - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Proposal to re-add Dan Prince to nova-core
+1 On Nov 26, 2013, at 11:32 AM, Russell Bryant rbry...@redhat.com wrote: Greetings, I would like to propose that we re-add Dan Prince to the nova-core review team. Dan Prince has been involved with Nova since early in OpenStack's history (Bexar timeframe). He was a member of the nova-core review team from May 2011 to June 2013. He has since picked back up with nova reviews [1]. We always say that when people leave nova-core, we would love to have them back if they are able to commit the time in the future. I think this is a good example of that. Please respond with +1s or any concerns. Thanks, [1] http://russellbryant.net/openstack-stats/nova-reviewers-30.txt -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Thoughs please on how to address a problem with mutliple deletes leading to a nova-compute thread pool problem
On Oct 25, 2013, at 3:46 AM, Day, Phil philip@hp.com wrote: Hi Folks, We're very occasionally seeing problems where a thread processing a create hangs (and we've seen when taking to Cinder and Glance). Whilst those issues need to be hunted down in their own rights, they do show up what seems to me to be a weakness in the processing of delete requests that I'd like to get some feedback on. Delete is the one operation that is allowed regardless of the Instance state (since it's a one-way operation, and users should always be able to free up their quota). However when we get a create thread hung in one of these states, the delete requests when they hit the manager will also block as they are synchronized on the uuid. Because the user making the delete request doesn't see anything happen they tend to submit more delete requests. The Service is still up, so these go to the computer manager as well, and eventually all of the threads will be waiting for the lock, and the compute manager will stop consuming new messages. The problem isn't limited to deletes - although in most cases the change of state in the API means that you have to keep making different calls to get past the state checker logic to do it with an instance stuck in another state. Users also seem to be more impatient with deletes, as they are trying to free up quota for other things. So while I know that we should never get a thread into a hung state into the first place, I was wondering about one of the following approaches to address just the delete case: i) Change the delete call on the manager so it doesn't wait for the uuid lock. Deletes should be coded so that they work regardless of the state of the VM, and other actions should be able to cope with a delete being performed from under them. There is of course no guarantee that the delete itself won't block as well. Agree. I've argued for a long time that our code should be able to handle the instance disappearing. We do have a number of places where we catch InstanceNotFound to handle this already. ii) Record in the API server that a delete has been started (maybe enough to use the task state being set to DELETEING in the API if we're sure this doesn't get cleared), and add a periodic task in the compute manager to check for and delete instances that are in a DELETING state for more than some timeout. Then the API, knowing that the delete will be processes eventually can just no-op any further delete requests. We already set to DELETING in the API (unless I'm mistaken -- but I looked at this recently). However, instead of dropping duplicate deletes, I say they should still be sent/handled. Any delete code should be able to handle if another delete is occurring at the same time, IMO… much like how you say other methods should be able to handle an instance disappearing from underneath. If a compute goes down while 'deleting', a 2nd delete later should still be able to function locally. Same thing if the message to compute happens to be lost. iii) Add some hook into the ServiceGroup API so that the timer could depend on getting a free thread from the compute manager pool (ie run some no-op task) - so that of there are no free threads then the service becomes down. That would (eventually) stop the scheduler from sending new requests to it, and make deleted be processed in the API server but won't of course help with commands for other instances on the same host. This seems kinda hacky to me. iv) Move away from having a general topic and thread pool for all requests, and start a listener on an instance specific topic for each running instance on a host (leaving the general topic and pool just for creates and other non-instance calls like the hypervisor API). Then a blocked task would only affect request for a specific instance. I don't like this one when thinking about scale. 1 million instances = = 1 million more queues. I'm tending towards ii) as a simple and pragmatic solution in the near term, although I like both iii) and iv) as being both generally good enhancments - but iv) in particular feels like a pretty seismic change. I vote for both i) and ii) at minimum. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Swift] Havana Release Notes Known Issues is talking about Nova (Re: [Openstack] OpenStack 2013.2 (Havana) is released !)
I may have put that in the wrong spot. Oops. On Oct 18, 2013, at 11:11 PM, Akihiro Motoki amot...@gmail.com wrote: Hi Thierry, John, In Havana release notes, Swift known issues section is talking about Nova Cells issue. Could you confirm? https://wiki.openstack.org/wiki/ReleaseNotes/Havana#Known_Issues Thanks, Akihiro On Thu, Oct 17, 2013 at 11:23 PM, Thierry Carrez thie...@openstack.org wrote: Hello everyone, It is my great pleasure to announce the final release of OpenStack 2013.2. It marks the end of the Havana 6-month-long development cycle, which saw the addition of two integrated components (Ceilometer and Heat), the completion of more than 400 feature blueprints and the fixing of more than 3000 reported bugs ! You can find source tarballs for each integrated project, together with lists of features and bugfixes, at: OpenStack Compute:https://launchpad.net/nova/havana/2013.2 OpenStack Object Storage: https://launchpad.net/swift/havana/1.10.0 OpenStack Image Service: https://launchpad.net/glance/havana/2013.2 OpenStack Networking: https://launchpad.net/neutron/havana/2013.2 OpenStack Block Storage: https://launchpad.net/cinder/havana/2013.2 OpenStack Identity: https://launchpad.net/keystone/havana/2013.2 OpenStack Dashboard: https://launchpad.net/horizon/havana/2013.2 OpenStack Metering: https://launchpad.net/ceilometer/havana/2013.2 OpenStack Orchestration: https://launchpad.net/heat/havana/2013.2 The Havana Release Notes contain an overview of the key features, as well as upgrade notes and current lists of known issues. You can access them at: https://wiki.openstack.org/wiki/ReleaseNotes/Havana In 19 days, our community will gather in Hong-Kong for the OpenStack Summit: 4 days of conference to discuss all things OpenStack and a Design Summit to plan the next 6-month development cycle, codenamed Icehouse. It's not too late to join us there, see http://www.openstack.org/summit/openstack-summit-hong-kong-2013/ for more details. Congratulations to everyone who contributed to this development cycle and participated in making this awesome release possible ! -- Thierry Carrez (ttx) ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openst...@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Swift] Havana Release Notes Known Issues is talking about Nova (Re: [Openstack] OpenStack 2013.2 (Havana) is released !)
Ah, I know what happened. This is corrected now. - Chris On Oct 19, 2013, at 12:27 AM, Chris Behrens cbehr...@codestud.com wrote: I may have put that in the wrong spot. Oops. On Oct 18, 2013, at 11:11 PM, Akihiro Motoki amot...@gmail.com wrote: Hi Thierry, John, In Havana release notes, Swift known issues section is talking about Nova Cells issue. Could you confirm? https://wiki.openstack.org/wiki/ReleaseNotes/Havana#Known_Issues Thanks, Akihiro On Thu, Oct 17, 2013 at 11:23 PM, Thierry Carrez thie...@openstack.org wrote: Hello everyone, It is my great pleasure to announce the final release of OpenStack 2013.2. It marks the end of the Havana 6-month-long development cycle, which saw the addition of two integrated components (Ceilometer and Heat), the completion of more than 400 feature blueprints and the fixing of more than 3000 reported bugs ! You can find source tarballs for each integrated project, together with lists of features and bugfixes, at: OpenStack Compute:https://launchpad.net/nova/havana/2013.2 OpenStack Object Storage: https://launchpad.net/swift/havana/1.10.0 OpenStack Image Service: https://launchpad.net/glance/havana/2013.2 OpenStack Networking: https://launchpad.net/neutron/havana/2013.2 OpenStack Block Storage: https://launchpad.net/cinder/havana/2013.2 OpenStack Identity: https://launchpad.net/keystone/havana/2013.2 OpenStack Dashboard: https://launchpad.net/horizon/havana/2013.2 OpenStack Metering: https://launchpad.net/ceilometer/havana/2013.2 OpenStack Orchestration: https://launchpad.net/heat/havana/2013.2 The Havana Release Notes contain an overview of the key features, as well as upgrade notes and current lists of known issues. You can access them at: https://wiki.openstack.org/wiki/ReleaseNotes/Havana In 19 days, our community will gather in Hong-Kong for the OpenStack Summit: 4 days of conference to discuss all things OpenStack and a Design Summit to plan the next 6-month development cycle, codenamed Icehouse. It's not too late to join us there, see http://www.openstack.org/summit/openstack-summit-hong-kong-2013/ for more details. Congratulations to everyone who contributed to this development cycle and participated in making this awesome release possible ! -- Thierry Carrez (ttx) ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openst...@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] TC candidacy
Hi all, I'd like to announce my candidacy for a seat on the OpenStack Technical Committee. - General background - I have over 15 years of experience designing and building distributed systems. I am currently a Principal Engineer at Rackspace, where I have been for a little over 3 years now. Most of my time at Rackspace has been spent working on OpenStack as both a developer and a technical leader. My first week at Rackspace was spent at the very first OpenStack Design Summit in Austin where the project was announced. Prior to working at Rackspace, I held various roles over 14 years at Concentric Network Corporation/XO Communications including Senior Software Architect and eventually Director of Engineering. My main focus there was on an award winning web/email hosting platform which we'd built to be extremely scalable and fault tolerant. While my name is not on this patent, I was heavily involved with the development and design that led to US6611861. - Why am I interested? - This is my 3rd time running and I don't want to be considered a failure! But seriously, as I have mentioned in the past, I have strong feelings for OpenStack and I want to help as much as possible to take it to the next level. I have a lot of technical knowledge and experience building scalable distributed systems. I would like to use this knowledge for good, not evil. - OpenStack contributions - As I mentioned above, I was at the very first design summit, so I've been involved with the project from the beginning. I started the initial work for nova-scheduler shortly after the project was opened. I also implemented the RPC support for kombu, making sure to properly support reconnecting and so forth which didn't work quite so well with the carrot code. I've contributed a number of improvements designed to make nova-api more performant. I've worked on the filter scheduler as well as designing and implementing the first version of the Zones replacement that we named 'Cells'. And most recently, I was involved in the design and implementation of the unified objects code in nova. During Icehouse, I'm hoping to focus on performance and stabilization while also helping to finish objects conversion. - Summary - I feel my years of experience contributing to and leading large scale technical projects along with my knowledge of the OpenStack projects will provide a good foundation for technical leadership. Thanks, - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo.db] Proposal: Get rid of deleted column
On Aug 20, 2013, at 12:51 PM, Ed Leafe e...@openstack.org wrote: On Aug 20, 2013, at 2:33 PM, Chris Behrens cbehr...@codestud.com wrote: For instances table, we want to make sure 'uuid' is unique. But we can't put a unique constraint on that alone. If that instance gets deleted.. we should be able to create another entry with the same uuid without a problem. So we need a unique constraint on uuid+deleted. But if 'deleted' is only 0 or 1… we can only have 1 entry deleted and 1 entry not deleted. Using deleted=`id` to mark deletion solves that problem. You could use deleted_at… but 2 creates and deletes within the same second would not work. :) This creates another problem if you ever need to delete this second instance, because now you have two with the same uuid and the same deleted status. Not with the setting of 'deleted' to the row's `id` on delete… since `id` is unique. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo.db] Proposal: Get rid of deleted column
On Aug 20, 2013, at 1:05 PM, Jay Pipes jaypi...@gmail.com wrote: I see the following use case: 1) Create something with a unique name within your tenant 2) Delete that 3) Create something with the same unique name immediately after As a pointless and silly use case that we should not cater to. It's made the database schema needlessly complex IMO and added columns to a unique constraint that make a DBA's job more complex in order to fulfill a use case that really isn't particularly compelling. I was having a convo on IRC with Boris and stated the use case in different terms: If you delete your Gmail email address, do you expect to immediately be able to create a new Gmail email with the previous address? If you answer yes, then this unique constraint on the deleted column makes sense to you. If you answer no, then the whole thing seems like we've spent a lot of effort on something that isn't particularly useful except in random test cases that try to create and delete the same thing in rapid succession. And IMO, those kinds of test cases should be deleted -- hard-deleted. I would answer 'no' to the gmail question. I would answer 'yes' depending on what other things we may talk about. If we put (or maybe we have this -- I didn't check) unique constraints on the metadata table for metadata key… It would be rather silly to not allow someone to reset some metadata with the same key immediately. One could argue that we just un-delete the former row and update it, however… but I think that breaks archiving (something*I'm* not a fan of ;) - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo.db] Proposal: Get rid of deleted column
On Aug 20, 2013, at 3:29 PM, Vishvananda Ishaya vishvana...@gmail.com wrote: c) is going ot take a while. There are still quite a few places in nova, for example, that depend on accessing deleted records. Do you have a list of these places? No. I believe Joe Gordon did an initial look long ago. Off the top of my head I remember flavors and the simple-usage extension use them. Yeah, flavors is a problem still, I think. Although we've moved towards fixing most of it. Unfortunately the API supports showing some amount of deleted instances if you specify 'changes-since'. Although since I don't think 'some amount' is really quantified, we may be able to ignore that. We should make that go away in v3… as long as there is some way for someone to see instances that can be reclaimed (soft delete state which is different than DB soft-delete) There are some periodic tasks that look at deleted records in order to sync things. The one that stands out to me is '_cleanup_running_deleted_instances'. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] cells checks on patches
I have just put up a review here: https://review.openstack.org/#/c/38897/ which should address the exercise.sh issues when n-cell is enabled. Hopefully this works in the gate like it does for me locally. Then we can move on to looking at tempest. - Chris On Jul 15, 2013, at 6:13 AM, Andrew Laski andrew.la...@rackspace.com wrote: I will also be working to help get cells passing tests. I just setup a blueprint on the Nova side for this, https://blueprints.launchpad.net/nova/+spec/cells-gating. On 07/13/13 at 05:00pm, Chris Behrens wrote: I can make a commitment to help getting cells passing. Basically, I'd like to do whatever I can to make sure we can have a useful gate on cells. Unfortunately I'm going to be mostly offline for the next 10 days or so, however. :) I thought there was a sec group patch up for cells, but I've not fully reviewed it. The generic cannot communicate with cell 'child' almost sounds like some other basic issue I'll see if I can take a peak during my layovers tonight. On Jul 13, 2013, at 8:28 AM, Sean Dague s...@dague.net wrote: On 07/13/2013 10:50 AM, Dan Smith wrote: Currently cells can even get past devstack exercises, which are very minor sanity checks for the environment (nothing tricky). I thought that the plan was to deprecate the devstack exercises and just use tempest. Is that not the case? I'd bet that the devstack exercises are just not even on anyone's radar. Since the excellent work you QA folks did to harden those tests before grizzly, I expect most people take them for granted now :) Digging into the logs just a bit, I see what looks like early failures related to missing security group issues in the cells manager log. I know there are some specific requirements in how things have to be set up for cells, so I think it's likely that we'll need to do some tweaking of configs to get all of this right. We enabled the test knowing that it wasn't going to pass for a while, and it's only been running for less than 24 hours. In the same way that the grenade job had (until recently) been failing on everything, the point of enabling the cells test now is so that we can start iterating on fixes so that we can hopefully have some amount of regular test coverage before havana. Like I said, as long as someone is going to work on it, I'm happy. :) I just don't want this to be an enable the tests and hope magically fairies come to fix them issue. That's what we did on full neutron tests, and it's been bouncing around like that for a while. We are planning on disabling the devstack exercises, it wasn't so much that, it's that it looks like there is fundamental lack of functioning nova on devstack for cells right now. The security groups stack trace is just a side effect of cells falling over in a really low level way (this is what's before and after the trace). 2013-07-13 00:12:18.605 ERROR nova.cells.scheduler [req-dcbb868c-98a7-4d65-94b3-e1234c50e623 demo demo] Couldn't communicate with cell 'child' 2013-07-13 00:12:18.606 ERROR nova.cells.scheduler [req-dcbb868c-98a7-4d65-94b3-e1234c50e623 demo demo] Couldn't communicate with any cells Again, mostly I want to know that we've got a blueprint or bug that's high priority and someone's working on it. It did take a while to get grenade there (we're 2 bugs away from being able to do it repeatably in the gate), but during that time we did have people working on it. It just takes a while to get to the bottom of these issues some times, so I want people to have a realistic expectation on how quickly we'll go from running upstream to gating. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] cells checks on patches
I can make a commitment to help getting cells passing. Basically, I'd like to do whatever I can to make sure we can have a useful gate on cells. Unfortunately I'm going to be mostly offline for the next 10 days or so, however. :) I thought there was a sec group patch up for cells, but I've not fully reviewed it. The generic cannot communicate with cell 'child' almost sounds like some other basic issue I'll see if I can take a peak during my layovers tonight. On Jul 13, 2013, at 8:28 AM, Sean Dague s...@dague.net wrote: On 07/13/2013 10:50 AM, Dan Smith wrote: Currently cells can even get past devstack exercises, which are very minor sanity checks for the environment (nothing tricky). I thought that the plan was to deprecate the devstack exercises and just use tempest. Is that not the case? I'd bet that the devstack exercises are just not even on anyone's radar. Since the excellent work you QA folks did to harden those tests before grizzly, I expect most people take them for granted now :) Digging into the logs just a bit, I see what looks like early failures related to missing security group issues in the cells manager log. I know there are some specific requirements in how things have to be set up for cells, so I think it's likely that we'll need to do some tweaking of configs to get all of this right. We enabled the test knowing that it wasn't going to pass for a while, and it's only been running for less than 24 hours. In the same way that the grenade job had (until recently) been failing on everything, the point of enabling the cells test now is so that we can start iterating on fixes so that we can hopefully have some amount of regular test coverage before havana. Like I said, as long as someone is going to work on it, I'm happy. :) I just don't want this to be an enable the tests and hope magically fairies come to fix them issue. That's what we did on full neutron tests, and it's been bouncing around like that for a while. We are planning on disabling the devstack exercises, it wasn't so much that, it's that it looks like there is fundamental lack of functioning nova on devstack for cells right now. The security groups stack trace is just a side effect of cells falling over in a really low level way (this is what's before and after the trace). 2013-07-13 00:12:18.605 ERROR nova.cells.scheduler [req-dcbb868c-98a7-4d65-94b3-e1234c50e623 demo demo] Couldn't communicate with cell 'child' 2013-07-13 00:12:18.606 ERROR nova.cells.scheduler [req-dcbb868c-98a7-4d65-94b3-e1234c50e623 demo demo] Couldn't communicate with any cells Again, mostly I want to know that we've got a blueprint or bug that's high priority and someone's working on it. It did take a while to get grenade there (we're 2 bugs away from being able to do it repeatably in the gate), but during that time we did have people working on it. It just takes a while to get to the bottom of these issues some times, so I want people to have a realistic expectation on how quickly we'll go from running upstream to gating. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] cells checks on patches
On Jul 13, 2013, at 8:28 AM, Sean Dague s...@dague.net wrote: Like I said, as long as someone is going to work on it, I'm happy. :) I just don't want this to be an enable the tests and hope magically fairies come to fix them issue. That's what we did on full neutron tests, and it's been bouncing around like that for a while. We are planning on disabling the devstack exercises, it wasn't so much that, it's that it looks like there is fundamental lack of functioning nova on devstack for cells right now. The security groups stack trace is just a side effect of cells falling over in a really low level way (this is what's before and after the trace). 2013-07-13 00:12:18.605 ERROR nova.cells.scheduler [req-dcbb868c-98a7-4d65-94b3-e1234c50e623 demo demo] Couldn't communicate with cell 'child' 2013-07-13 00:12:18.606 ERROR nova.cells.scheduler [req-dcbb868c-98a7-4d65-94b3-e1234c50e623 demo demo] Couldn't communicate with any cells Did you dig these out manually somehow? It looks like that unfortunately there's no screen-n-cells.txt saved in the gate, which would be extremely useful. :) It looks like all errors must be limited to that service right now… makes me wonder if the devstack needs tweaked now for cells. In fact, I *might* know the problem. Some cells config options were deprecated, and it appears that backwards compatibility was lost. I ran into this myself, and I took a stab at fixing it (I was unable to reproduce it in tests, but it certainly showed up in one of our environments). We should probably commit a fix to devstack to use the new config options no matter what: 1) Remove the usage of compute_api_class CONF option 2) Where compute_api_class was set to the ComputeCells class in the API cell, instead use this config: [cells] cell_type=api 3) In a child cell where you did not override compute_api_class, use this: [cells] cell_type=compute Maybe someone could try committing that fix to devstack for me while I'm traveling? :) I wonder if that'll get us a little further along... - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Nominating John Garbutt for nova-core
+1 On Jun 26, 2013, at 10:09 AM, Russell Bryant rbry...@redhat.com wrote: Greetings, I would like to nominate John Garbutt for the nova-core team. John has been involved with nova for a long time now. He's primarily known for his great work on the xenapi driver. However, he has been contributing and reviewing in other areas, as well. Based on my experience with him I think he would be a good addition, so it would be great to have him on board to help keep up with the review load. Please respond with +1s or any concerns. References: https://review.openstack.org/#/dashboard/782 https://review.openstack.org/#/q/reviewer:782,n,z https://launchpad.net/~johngarbutt/+specs?role=assignee https://launchpad.net/~johngarbutt/+bugs?role=assignee Thanks, -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Cells design issue
On Jun 21, 2013, at 9:16 AM, Armando Migliaccio amigliac...@vmware.com wrote: In my view a cell should only know about the queue it's connected to, and let the 'global' message queue to do its job of dispatching the messages to the right recipient: that would solve the problem altogether. Were federated queues and topic routing not considered fit for the purpose? I guess the drawback with this is that it is tight to Rabbit. If you're referring to the rabbit federation plugin, no, it was not considered. I'm not even sure that via rabbit queues is the right way to talk cell to cell. But I really do not want to get into a full blown cells communication design discussion here. We can do that in another thread, if we need to do so. :) It is what it is today and this thread is just about how to express the configuration for it. Regarding Mark's config suggestion: On Mon, Jun 17, 2013 at 2:14 AM, Mark McLoughlin mar...@redhat.com wrote: I don't know whether I like it yet or not, but here's how it might look: [cells] parents = parent1 children = child1, child2 [cell:parent1] transport_url = qpid://host1/nova [cell:child1] transport_url = qpid://host2/child1_nova [cell:child2] transport_url = qpid://host2/child2_nova […] Yeah, that's what I was picturing if going that route. I guess the code for it is not bad at all. But with oslo.config, can I reload (re-parse) the config file later, or does the service need to be restarted? - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Compute node stats sent to the scheduler
On Jun 17, 2013, at 7:49 AM, Russell Bryant rbry...@redhat.com wrote: On 06/16/2013 11:25 PM, Dugger, Donald D wrote: Looking into the scheduler a bit there's an issue of duplicated effort that is a little puzzling. The database table `compute_nodes' is being updated periodically with data about capabilities and resources used (memory, vcpus, ...) while at the same time a periodic RPC call is being made to the scheduler sending pretty much the same data. Does anyone know why we are updating the same data in two different place using two different mechanisms? Also, assuming we were to remove one of these updates, which one should go? (I thought at one point in time there was a goal to create a database free compute node which would imply we should remove the DB update.) Have you looked around to see if any code is using the data from the db? Having schedulers hit the db for the current state of all compute nodes all of the time would be a large additional db burden that I think we should avoid. So, it makes sense to keep the rpc fanout_cast of current stats to schedulers. This is actually what the scheduler uses. :) The fanout messages are too infrequent and can be too laggy. So, the scheduler was moved to using the DB a long, long time ago… but it was very inefficient, at first, because it looped through all instances. So we added things we needed into compute_node and compute_node_stats so we only had to look at the hosts. You have to pull the hosts anyway, so we pull the stats at the same time. The problem is… when we stopped using certain data from the fanout messages…. we never removed it. We should AT LEAST do this. But.. (see below).. The scheduler also does a fanout_cast to all compute nodes when it starts up to trigger the compute nodes to populate the cache in the scheduler. It would be nice to never fanout_cast to all compute nodes (given that there may be a *lot* of them). We could replace this with having the scheduler populate its cache from the database. I think we should audit the remaining things that the scheduler uses from these messages and move them to the DB. I believe it's limited to the hypervisor capabilities to compare against aggregates or some such. I believe it's things that change very rarely… so an alternative can be to only send fanout messages when capabilities change! We could always do that as a first step. Removing the db usage completely would be nice if nothing is actually using it, but we'd have to look into an alternative solution for removing the scheduler fanout_cast to compute. Relying on anything but the DB for current memory free, etc, is just too laggy… so we need to stick with it, IMO. - Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev