Re: [openstack-dev] [nova-docker] Status update
Indeed, very interesting use-case. I guess it is similar to any situation where you might want to dynamically scale your undercloud via Heat + Nova, and use same or different Nova for the 'overcloud'. You can do something like this today with Ironic and KVM, I think (we experimented with such an 'integrated undercloud+overcloud management' environment for a while, and it seemed to be working - up to few minor glitches). Of course, the system would need to be carefully configured (maybe requiring some tweaking) if same Nova is used for both - but should be feasible, IMO. The differentiation between the two types of Nova instances should be rather straightforward to implement with, for example, image properties or flavor extra spec (or maybe both). Regards, Alex Adrian Otto adrian.o...@rackspace.com wrote on 17/05/2015 11:22:51 PM: From: Adrian Otto adrian.o...@rackspace.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Date: 17/05/2015 11:28 PM Subject: Re: [openstack-dev] [nova-docker] Status update Good questions Matt and Alex. Currently Magnum creates Bays (places that can run containers, or pods of containers, and other high level resources such as service, replication controllers, etc.) composed of one or more Nova instances (Nodes). This way, we can potentially allow the creation and management for containers on any compute form factor (bare metal, VM, container, etc.). The Nova instances Magnum uses to form the Bays come from Heat. NOTE: There is no such thing as a nova-magnum virt driver today. The following discussion is theoretical. Understanding that, it would be possible to make a nova-magnum virt driver that talks to Magnum to ask for an instance of type container from an *existing* Bay, but then Magnum would need to have access to Nova instances that are NOT produced by the nova-magnum driver in order to scale out the Bay by adding more nodes to it. If we do this, and the cloud operator does not realize the circular dependency when setting Nova to use a nova-magnum virt driver, it would be possible to create a loop where nova-magnum provides containers to Magnum that come from the same bay we are attempting to scale out. This would prevent the Bay from actually scaling out because it will be sourcing capacity from itself. We could allow this to work by requiring anyone who uses nova-magnum to also have another Nova host aggregate that uses an alternate virt driver (Ironic, libvirt, etc.), and having some way for Magnum?s Heat template to ask only for instances produced without the Magnum virt driver when forming or scaling Bays. I suppose a scheduling hint might be adequate for this. Adrian On May 17, 2015, at 11:48 AM, Matt Riedemann mrie...@linux.vnet.ibm.com wrote: On 5/16/2015 10:52 PM, Alex Glikson wrote: If system containers is a viable use-case for Nova, and if Magnum is aiming at both application containers and system containers, would it make sense to have a new virt driver in nova that would invoke Magnum API for container provisioning and life cycle? This would avoid (some of the) code duplication between Magnum and whatever nova virt driver would support system containers (such as nova-docker). Such an approach would be conceptually similar to nova virt driver invoking Ironic API, replacing nova-baremetal (here again, Ironic surfaces various capabilities which don't make sense in Nova). We have recently started exploring this direction, and would be glad to collaborate with folks if this makes sense. Regards, Alex Adrian Otto adrian.o...@rackspace.com wrote on 09/05/2015 07:55:47 PM: From: Adrian Otto adrian.o...@rackspace.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Date: 09/05/2015 07:57 PM Subject: Re: [openstack-dev] [nova-docker] Status update John, Good questions. Remarks in-line from the Magnum perspective. On May 9, 2015, at 2:51 AM, John Garbutt j...@johngarbutt.com wrote: On 1 May 2015 at 16:14, Davanum Srinivas dava...@gmail.com wrote: Anyone still interested in this work? :) * there's a stable/kilo branch now (see http://git.openstack.org/cgit/stackforge/nova-docker/). * CI jobs are running fine against both nova trunk and nova's stable/kilo branch. * there's an updated nova-spec to get code back into nova tree (see https://review.openstack.org/#/c/128753/) To proxy the discussion from the etherpad onto the ML, we need to work out why this lives in nova, given Magnum is the place to do container specific things. To the extent that users want to control Docker containers through the Nova API (without elaborate extensions), I think a stable in- tree nova-docker driver makes complete sense for that. [...] Now whats
Re: [openstack-dev] [nova-docker] Status update
If system containers is a viable use-case for Nova, and if Magnum is aiming at both application containers and system containers, would it make sense to have a new virt driver in nova that would invoke Magnum API for container provisioning and life cycle? This would avoid (some of the) code duplication between Magnum and whatever nova virt driver would support system containers (such as nova-docker). Such an approach would be conceptually similar to nova virt driver invoking Ironic API, replacing nova-baremetal (here again, Ironic surfaces various capabilities which don't make sense in Nova). We have recently started exploring this direction, and would be glad to collaborate with folks if this makes sense. Regards, Alex Adrian Otto adrian.o...@rackspace.com wrote on 09/05/2015 07:55:47 PM: From: Adrian Otto adrian.o...@rackspace.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Date: 09/05/2015 07:57 PM Subject: Re: [openstack-dev] [nova-docker] Status update John, Good questions. Remarks in-line from the Magnum perspective. On May 9, 2015, at 2:51 AM, John Garbutt j...@johngarbutt.com wrote: On 1 May 2015 at 16:14, Davanum Srinivas dava...@gmail.com wrote: Anyone still interested in this work? :) * there's a stable/kilo branch now (see http://git.openstack.org/cgit/stackforge/nova-docker/). * CI jobs are running fine against both nova trunk and nova's stable/kilo branch. * there's an updated nova-spec to get code back into nova tree (see https://review.openstack.org/#/c/128753/) To proxy the discussion from the etherpad onto the ML, we need to work out why this lives in nova, given Magnum is the place to do container specific things. To the extent that users want to control Docker containers through the Nova API (without elaborate extensions), I think a stable in- tree nova-docker driver makes complete sense for that. [...] Now whats the reason for adding the Docker driver, given Nova is considering container specific APIs out of scope, and expecting Magnum to own that kind of thing. I do think nova-docker should find it?s way into the Nova tree. This makes containers more accessible in OpenStack, and appropriate for use cases where users want to treat containers like they treat virtual machines. On the subject of extending the Nova API to accommodate special use cases of containers that are beyond the scope of the Nova API, I think we should resist that, and focus those container-specific efforts in Magnum. That way, cloud operators can choose whether to use Nova or Magnum for their container use cases depending on the range of features they desire from the API. This approach should also result in less overlap of efforts. [...] To sum up, I strongly support merging in nova-docker, with the caveat that it operates within the existing Nova API (with few minor exceptions). For features that require API features that are truly container specific, we should land those in Magnum, and keep the Nova API scoped to operations that are appropriate for ?all instance types. Adrian Thanks, John __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] Re-evaluating the suitability of the 6 month release cycle
Tom Fifield t...@openstack.org wrote on 25/02/2015 06:46:13 AM: On 24/02/15 19:27, Daniel P. Berrange wrote: On Tue, Feb 24, 2015 at 12:05:17PM +0100, Thierry Carrez wrote: Daniel P. Berrange wrote: [...] I'm not familiar with how the translations works, but if they are waiting until the freeze before starting translation work I'd say that is a mistaken approach. Obviously during active dev part of the cycle, some translated strings are in flux, so if translation was taking place in parallel there could be some wasted effort, but I'd expect that to be the minority case. I think the majority of translation work can be done in parallel with dev work and the freeze time just needs to tie up the small remaining bits. So, two points: 1) We wouldn't be talking about throwing just a couple of percent of their work away. As an example, even without looking at the introduction of new strings or deleting others, you may not be aware that changing a single word in a string in the code means that entire string needs to be re-translated. Even with the extensive translation memory systems we have making suggestions as best they can, we're talking about very, very significant amounts of wasted effort. How difficult would it be to try quantifying this wasted effort? For example, if someone could write a script that extracts the data for a histogram showing the amount of strings (e.g., in Nova) that have been changed/overridden in consequent patches up to 1 week apart, between 1 and 2 weeks apart, and so on up to, say, 52 weeks. Regards, Alex Regards, Tom __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] [tc] Multi-clouds integration by OpenStack cascading
This sounds related to the discussion on the 'Nova clustered hypervisor driver' which started at Juno design summit [1]. Talking to another OpenStack should be similar to talking to vCenter. The idea was that the Cells support could be refactored around this notion as well. Not sure whether there have been any active progress with this in Juno, though. Regards, Alex [1] http://junodesignsummit.sched.org/event/a0d38e1278182eb09f06e22457d94c0c# [2] https://etherpad.openstack.org/p/juno-nova-clustered-hypervisor-support From: joehuang joehu...@huawei.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Date: 30/09/2014 04:08 PM Subject:[openstack-dev] [all] [tc] Multi-clouds integration by OpenStack cascading Hello, Dear TC and all, Large cloud operators prefer to deploy multiple OpenStack instances(as different zones), rather than a single monolithic OpenStack instance because of these reasons: 1) Multiple data centers distributed geographically; 2) Multi-vendor business policy; 3) Server nodes scale up modularized from 00's up to million; 4) Fault and maintenance isolation between zones (only REST interface); At the same time, they also want to integrate these OpenStack instances into one cloud. Instead of proprietary orchestration layer, they want to use standard OpenStack framework for Northbound API compatibility with HEAT/Horizon or other 3rd ecosystem apps. We call this pattern as OpenStack Cascading, with proposal described by [1][2]. PoC live demo video can be found[3][4]. Nova, Cinder, Neutron, Ceilometer and Glance (optional) are involved in the OpenStack cascading. Kindly ask for cross program design summit session to discuss OpenStack cascading and the contribution to Kilo. Kindly invite those who are interested in the OpenStack cascading to work together and contribute it to OpenStack. (I applied for “other projects” track [5], but it would be better to have a discussion as a formal cross program session, because many core programs are involved ) [1] wiki: https://wiki.openstack.org/wiki/OpenStack_cascading_solution [2] PoC source code: https://github.com/stackforge/tricircle [3] Live demo video at YouTube: https://www.youtube.com/watch?v=OSU6PYRz5qY [4] Live demo video at Youku (low quality, for those who can't access YouTube):http://v.youku.com/v_show/id_XNzkzNDQ3MDg4.html [5] http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg36395.html Best Regards Chaoyi Huang ( Joe Huang ) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Proposal: Move CPU and memory allocation ratio out of scheduler
So maybe the problem isn?t having the flavors so much, but in how the user currently has to specific an exact match from that list. If the user could say ?I want a flavor with these attributes? and then the system would find a ?best match? based on criteria set by the cloud admin then would that be a more user friendly solution ? Interesting idea.. Thoughts how this can be achieved? Alex From: Day, Phil philip@hp.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 06/06/2014 12:38 PM Subject:Re: [openstack-dev] [nova] Proposal: Move CPU and memory allocation ratio out of scheduler From: Scott Devoid [mailto:dev...@anl.gov] Sent: 04 June 2014 17:36 To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [nova] Proposal: Move CPU and memory allocation ratio out of scheduler Not only live upgrades but also dynamic reconfiguration. Overcommitting affects the quality of service delivered to the cloud user. In this situation in particular, as in many situations in general, I think we want to enable the service provider to offer multiple qualities of service. That is, enable the cloud provider to offer a selectable level of overcommit. A given instance would be placed in a pool that is dedicated to the relevant level of overcommit (or, possibly, a better pool if the selected one is currently full). Ideally the pool sizes would be dynamic. That's the dynamic reconfiguration I mentioned preparing for. +1 This is exactly the situation I'm in as an operator. You can do different levels of overcommit with host-aggregates and different flavors, but this has several drawbacks: 1. The nature of this is slightly exposed to the end-user, through extra-specs and the fact that two flavors cannot have the same name. One scenario we have is that we want to be able to document our flavor names--what each name means, but we want to provide different QoS standards for different projects. Since flavor names must be unique, we have to create different flavors for different levels of service. Sometimes you do want to lie to your users! [Day, Phil] I agree that there is a problem with having every new option we add in extra_specs leading to a new set of flavors.There are a number of changes up for review to expose more hypervisor capabilities via extra_specs that also have this potential problem.What I?d really like to be able to ask for a s a user is something like ?a medium instance with a side order of overcommit?, rather than have to choose from a long list of variations.I did spend some time trying to think of a more elegant solution ? but as the user wants to know what combinations are available it pretty much comes down to needing that full list of combinations somewhere.So maybe the problem isn?t having the flavors so much, but in how the user currently has to specific an exact match from that list. If the user could say ?I want a flavor with these attributes? and then the system would find a ?best match? based on criteria set by the cloud admin (for example I might or might not want to allow a request for an overcommitted instance to use my not-overcommited flavor depending on the roles of the tenant) then would that be a more user friendly solution ? 2. If I have two pools of nova-compute HVs with different overcommit settings, I have to manage the pool sizes manually. Even if I use puppet to change the config and flip an instance into a different pool, that requires me to restart nova-compute. Not an ideal situation. [Day, Phil] If the pools are aggregates, and the overcommit is defined by aggregate meta-data then I don?t see why you need to restart nova-compute. 3. If I want to do anything complicated, like 3 overcommit tiers with good, better, best performance and allow the scheduler to pick better for a good instance if the good pool is full, this is very hard and complicated to do with the current system. [Day, Phil] Yep, a combination of filters and weighting functions would allow you to do this ? its not really tied to whether the overcommit Is defined in the scheduler or the host though as far as I can see. I'm looking forward to seeing this in nova-specs! ~ Scott___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] etherpad to track Gant (scheduler) sessions at the Juno summit
It seems that there are also issues around scheduling in environments that comprise non-flat/homogeneous groups of hosts. Perhaps, related to 'clustered hypervisor support in Nova' proposal ( http://summit.openstack.org/cfp/details/145). Not sure whether we need a separate slot for this or not - but certainly related to Gantt. Regards, Alex From: Dugger, Donald D donald.d.dug...@intel.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, sylvain.ba...@bull.net sylvain.ba...@bull.net, Date: 08/04/2014 11:07 PM Subject:[openstack-dev] etherpad to track Gant (scheduler) sessions at the Juno summit As promised at the Gantt meeting today I?ve created an etherpad we can use to keep track of scheduler related sessions at the Juno summit. (I?ve made the name of the pad generic so we can re-use that pad for follow on summits and not have to change any wiki links): https://etherpad.openstack.org/p/Gantt-summit-sessions -- Don Dugger Censeo Toto nos in Kansa esse decisse. - D. Gale Ph: 303/443-3786 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] Can heat automatically create a flavor as part of stack creation?
Heat template orchestrates user actions, while management of flavors is typically admin's job (due to their tight link to the physical hardware configuration, unknown to a regular user). Regards, Alex From: ELISHA, Moshe (Moshe) moshe.eli...@alcatel-lucent.com To: openstack-dev@lists.openstack.org openstack-dev@lists.openstack.org, Date: 09/02/2014 09:54 AM Subject:[openstack-dev] [heat] Can heat automatically create a flavor as part of stack creation? Hello, I am wondering if instead of being obligated to use an existing flavor, I could declare a flavor (with its properties) inside Heat template and let Heat create the flavor automatically? Similar to the ability to create networks as part of the template. Thanks.___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] bp: nova-ecu-support
Similar capabilities are being introduced here: https://review.openstack.org/#/c/61839/ Regards, Alex From: Kenichi Oomichi oomi...@mxs.nes.nec.co.jp To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 03/02/2014 11:48 AM Subject:[openstack-dev] [Nova] bp: nova-ecu-support Hi, There is a blueprint ECU[1], and that is an interesting idea for me. so I'd like to know the comments about ECU idea. After production environments start, the operators will need to add compute nodes before exhausting the capacity. On the scenario, they'd like to add cost-efficient machines as the compute node at the time. So the production environments will consist of different performance compute nodes. Also they hope to provide the same performance virtual machines on different performance nodes if specifying the same flavor. Now nova contains flavor_extraspecs[2] which can customize the cpu bandwidth for each flavor: # nova flavor-key m1.low_cpu set quota:cpu_quota=1 # nova flavor-key m1.low_cpu set quota:cpu_period=2 However, this feature can not provide the same vm performance on different performance node, because this arranges the vm performance with the same ratio(cpu_quota/cpu_period) only even if the compute node performances are different. So it is necessary to arrange the different ratio based on each compute node performance. Amazon EC2 has ECU[3] already for implementing this, and the blueprint [1] is also for it. Any thoughts? Thanks Ken'ichi Ohmichi --- [1]: https://blueprints.launchpad.net/nova/+spec/nova-ecu-support [2]: http://docs.openstack.org/admin-guide-cloud/content/ch_introduction-to-openstack-compute.html#customize-flavors [3]: http://aws.amazon.com/ec2/faqs/ Q: What is a “EC2 Compute Unit” and why did you introduce it? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Gantt] Scheduler sub-group agenda 1/7
Maybe we can also briefly discuss the status of https://review.openstack.org/#/q/topic:bp/multiple-scheduler-drivers,n,z -- now that a revised implementation is available for review (broken into 4 small patches), and people are back from vacations, would be good to get some attention from relevant folks. Thanks, Alex From: Dugger, Donald D donald.d.dug...@intel.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 07/01/2014 05:24 AM Subject:[openstack-dev] [Gantt] Scheduler sub-group agenda 1/7 1) Memcached based scheduler updates 2) Scheduler code forklift 3) Instance groups -- Don Dugger Censeo Toto nos in Kansa esse decisse. - D. Gale Ph: 303/443-3786 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova][Schduler] Volunteers wanted for a modest proposal for an external scheduler in our lifetime
Great initiative! I would certainly be interested taking part in this -- although I wouldn't necessary claim to be among people with the know-how to design and implement it well. For sure this is going to be a painful but exciting process. Regards, Alex From: Robert Collins robe...@robertcollins.net To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 21/11/2013 11:00 PM Subject:[openstack-dev] [Nova][Schduler] Volunteers wanted for a modest proposal for an external scheduler in our lifetime https://etherpad.openstack.org/p/icehouse-external-scheduler I'm looking for 4-5 folk who have: - modest Nova skills - time to follow a fairly mechanical (but careful and detailed work needed) plan to break the status quo around scheduler extraction And of course, discussion galore about the idea :) Cheers, Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Does Nova really need an SQL database?
Another possible approach could be that only part of the 50 succeeds (reported back to the user), and then a retry mechanism at a higher level would potentially approach the other partition/scheduler - similar to today's retries. Regards, Alex From: Mike Wilson geekinu...@gmail.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 20/11/2013 05:53 AM Subject:Re: [openstack-dev] [Nova] Does Nova really need an SQL database? I've been thinking about this use case for a DHT-like design, I think I want to do what other people have alluded to here and try and intercept problematic requests like this one in some sort of pre sending to ring-segment stage. In this case the pre-stage could decide to send this off to a scheduler that has a more complete view of the world. Alternatively, don't make a single request for 50 instances, just send 50 requests for one? Is that a viable thing to do for this use case? -Mike On Tue, Nov 19, 2013 at 7:03 PM, Joshua Harlow harlo...@yahoo-inc.com wrote: At yahoo at least 50+ simultaneous will be the common case (maybe we are special). Think of what happens on www.yahoo.com say on the olympics, news.yahoo.com could need 50+ very very quickly (especially if say a gold medal is won by some famous person). So I wouldn't discount those being the common case (may not be common for some, but is common for others). In fact any website with spurious/spikey traffic will have the same desire; so it might be a target use-case for website like companies (or ones that can't upfront predict spikes). Overall though I think what u said about 'don't fill it up' is good general knowledge. Filling up stuff beyond a certain threshold is dangerous just in general (one should only push the limits so far before madness). On 11/19/13 4:08 PM, Clint Byrum cl...@fewbar.com wrote: Excerpts from Chris Friesen's message of 2013-11-19 12:18:16 -0800: On 11/19/2013 01:51 PM, Clint Byrum wrote: Excerpts from Chris Friesen's message of 2013-11-19 11:37:02 -0800: On 11/19/2013 12:35 PM, Clint Byrum wrote: Each scheduler process can own a different set of resources. If they each grab instance requests in a round-robin fashion, then they will fill their resources up in a relatively well balanced way until one scheduler's resources are exhausted. At that time it should bow out of taking new instances. If it can't fit a request in, it should kick the request out for retry on another scheduler. In this way, they only need to be in sync in that they need a way to agree on who owns which resources. A distributed hash table that gets refreshed whenever schedulers come and go would be fine for that. That has some potential, but at high occupancy you could end up refusing to schedule something because no one scheduler has sufficient resources even if the cluster as a whole does. I'm not sure what you mean here. What resource spans multiple compute hosts? Imagine the cluster is running close to full occupancy, each scheduler has room for 40 more instances. Now I come along and issue a single request to boot 50 instances. The cluster has room for that, but none of the schedulers do. You're assuming that all 50 come in at once. That is only one use case and not at all the most common. This gets worse once you start factoring in things like heat and instance groups that will want to schedule whole sets of resources (instances, IP addresses, network links, cinder volumes, etc.) at once with constraints on where they can be placed relative to each other. Actually that is rather simple. Such requests have to be serialized into a work-flow. So if you say give me 2 instances in 2 different locations then you allocate 1 instance, and then another one with 'not_in_location(1)' as a condition. Actually, you don't want to serialize it, you want to hand the whole set of resource requests and constraints to the scheduler all at once. If you do them one at a time, then early decisions made with less-than-complete knowledge can result in later scheduling requests failing due to being unable to meet constraints, even if there are actually sufficient resources in the cluster. The VM ensembles document at https://docs.google.com/document/d/1bAMtkaIFn4ZSMqqsXjs_riXofuRvApa--qo4U Twsmhw/edit?pli=1 has a good example of how one-at-a-time scheduling can cause spurious failures. And if you're handing the whole set of requests to a scheduler all at once, then you want the scheduler to have access to as many resources as possible so that it has the highest likelihood of being able to satisfy the request given the constraints. This use case is real and valid, which is why I think there is room for multiple approaches. For instance the situation you describe can also be dealt with by just having the cloud stay under-utilized and accepting that when you get over
Re: [openstack-dev] [nova][cinder][oslo][scheduler] How to leverage oslo schduler/filters for nova and cinder
Boris Pavlovic bpavlo...@mirantis.com wrote on 18/11/2013 08:31:20 AM: Actually schedulers in nova and cinder are almost the same. Well, this is kind of expected, since Cinder scheduler started as a copy-paste of the Nova scheduler :-) But they already started diverging (not sure whether this is necessarily a bad thing or not). So, Cinder (as well as Neutron, and potentially others) would need to be hooked to Nova rpc? As a first step, to prove approach yes, but I hope that we won't have nova or cinder scheduler at all. We will have just scheduler that works well. So, do you envision this code being merged in Nova first, and then move out? Start as a new thing from the beginning? Also, when it will be separate (probably not in icehouse?), will the communication continue being over RPC, or would we need to switch to REST? This could be conceptually similar to the communicate between cells today, via a separate RPC. By the way, since the relationships between resources are likely to reside in Heat DB, it could make sense to have this thing as a new Engine under Heat umbrella (as discussed in couple of other threads, you are also likely to need orchestration, when dealing with groups of resources). Instances of memcached. In an environment with multiple schedulers. I think you mentioned that if we have, say, 10 schedulers, we will also have 10 instances of memcached. Actually we are going to make implementation based on sqlalchemy as well. In case of memcached I just say one of arch, that you could run on each server with scheduler service memcahced instance. But it is not required, you can have even just one memcached instance for all scheulers (but it is not HA). I am not saying that having multiple instances of memcached is wrong - just that it would require some work.. It seems that one possible approach could be partitioning -- each scheduler will take care of a subset of the environment (availability zone?). This way data will be naturally partitioned too, and the data in memcached's will not need to be synchronized. Of course, making this HA would also require some effort (something like ZooKeeper could be really useful to manage all of this - configuration of each scheduler, ownership of underlying 'zones', leader election, etc). Regards, Alex Best regards, Boris Pavlovic --- Mirantis Inc. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][cinder][oslo][scheduler] How to leverage oslo schduler/filters for nova and cinder
Boris Pavlovic bpavlo...@mirantis.com wrote on 15/11/2013 05:57:20 PM: How do you envision the life cycle of such a scheduler in terms of code repository, build, test, etc? As a first step we could just make it inside nova, when we finish and prove that this approach works well we could split it out the nova in separated project and integrate with devstack and so on so on... So, Cinder (as well as Neutron, and potentially others) would need to be hooked to Nova rpc? What kind of changes to provisioning APIs do you envision to 'feed' such a scheduler? At this moment nova.scheduler is already separated service with amqp queue, what we need at this moment is to add 1 new rpc method to it. That will update state of some host. I was referring to external (REST) APIs. E.g., to specify affinity. Also, there are some interesting technical challenges (e.g., state management across potentially large number of instances of memcached). 10-100k keys-values is nothing for memcached. So what kind of instances? Instances of memcached. In an environment with multiple schedulers. I think you mentioned that if we have, say, 10 schedulers, we will also have 10 instances of memcached. Regards, Alex Best regards, Boris Pavlovic On Sun, Nov 10, 2013 at 4:20 PM, Alex Glikson glik...@il.ibm.com wrote: Hi Boris, This is a very interesting approach. How do you envision the life cycle of such a scheduler in terms of code repository, build, test, etc? What kind of changes to provisioning APIs do you envision to 'feed' such a scheduler? Any particular reason you didn't mention Neutron? Also, there are some interesting technical challenges (e.g., state management across potentially large number of instances of memcached). Thanks, Alex Boris Pavlovic bpavlo...@mirantis.com wrote on 10/11/2013 07:05:42 PM: From: Boris Pavlovic bpavlo...@mirantis.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 10/11/2013 07:07 PM Subject: Re: [openstack-dev] [nova][cinder][oslo][scheduler] How to leverage oslo schduler/filters for nova and cinder Jay, Hi Jay, yes we were working about putting all common stuff in oslo- scheduler. (not only filters) As a result of this work we understood, that this is wrong approach. Because it makes result code very complex and unclear. And actually we didn't find the way to put all common stuff inside oslo. Instead of trying to make life too complex we found better approach. Implement scheduler aaS that can scale (current solution has some scale issues) store all data from nova, cinder probably other places. To implement such approach we should change a bit current architecture: 1) Scheduler should store all his data (not nova.db cinder.db) 2) Scheduler should always have own snapshot of wold state, and sync it with another schedulers using something that is quite fast (e.g. memcached) 3) Merge schedulers rpc methods from nova cinder in one scheduler (it is possbile if we store all data from cinder nova in one sceduler). 4) Drop cinder, and nova tables that store host states (as we don't need them) We implemented already base start (mechanism that store snapshot of world state sync it between different schedulers): https://review.openstack.org/#/c/45867/ (it is still bit in WIP) Best regards, Boris Pavlovic --- Mirantis Inc. On Sun, Nov 10, 2013 at 1:59 PM, Jay Lau jay.lau@gmail.com wrote: I noticed that there is already a bp in oslo tracing what I want to do: https://blueprints.launchpad.net/oslo/+spec/oslo-scheduler Thanks, Jay 2013/11/9 Jay Lau jay.lau@gmail.com Greetings, Now in oslo, we already put some scheduler filters/weights logic there and cinder is using oslo scheduler filters/weights logic, seems we want both novacinder use this logic in future. Found some problems as following: 1) In cinder, some filters/weight logic reside in cinder/openstack/ common/scheduler and some filter/weight logic in cinder/scheduler, this is not consistent and also will make some cinder hackers confused: where shall I put the scheduler filter/weight. 2) Nova is not using filter/weight from oslo and also not using entry point to handle all filter/weight. 3) There is not enough filters in oslo, we may need to add more there: such as same host filter, different host filter, retry filter etc. So my proposal is as following: 1) Add more filters to oslo, such as same host filter, different host filter, retry filter etc. 2) Move all filters/weight logic in cinder from cinder/scheduler to cinder/openstack/common/scheduler 3) Enable nova use filter/weight logic from oslo (Move all filter logic to nova/openstack/common/scheduler) and also use entry point
Re: [openstack-dev] Split of the openstack-dev list
Sylvain Bauza sylvain.ba...@bull.net wrote on 15/11/2013 11:13:37 AM: On a technical note, as a Stackforge contributor, I'm trying to implement best practices of Openstack coding into my own project, and I'm facing day-to-day issues trying to understand what Oslo libs do or how they can be used in a fashion manner. Should I want to ask question to the community, I would have to cross-post to both lists. +1 To generalize a bit, there are many stackforge projects which are tightly related to more mature OpenStack projects, and restricting the discussion to a subset of the audience might not be a good idea. For example, TaskFlow versus Mistral versus Heat, Solum versus Heat, Manila versus Cinder, Designate versus Neutron, and I am sure there are (and surely will be) other examples. IMO, proper tagging could be a better solution. Regards, Alex ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Icehouse Blueprints
Russell Bryant rbry...@redhat.com wrote on 15/11/2013 06:49:31 PM: 3) If you have work planned for Icehouse, please get your blueprints filed as soon as possible. Be sure to set a realistic target milestone. So far, *everyone* has targeted *everything* to icehouse-1, which is set to be released in less than 3 weeks. That's far from realistic. Perhaps, one possible reason might be that until recently icehouse-1 was the only icehouse milestone available in launchpad. Good to know that others are now also available. Regards, Alex 4) We're assuming that anything not targeted to a release milestone is not actively being worked on. Soon (in one week) we will start closing all blueprints not targeted to a release milestone. Thanks! [1] http://lists.openstack.org/pipermail/openstack-dev/2013-October/017290.html [2] https://etherpad.openstack.org/p/NovaIcehouseProjectStructureAndProcess [3] https://wiki.openstack.org/wiki/Blueprints -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Configure overcommit policy
In fact, there is a blueprint which would enable supporting this scenario without partitioning -- https://blueprints.launchpad.net/nova/+spec/cpu-entitlement The idea is to annotate flavors with CPU allocation guarantees, and enable differentiation between instances, potentially running on the same host. The implementation is augmenting the CoreFilter code to factor in the differentiation. Hopefully this will be out for review soon. Regards, Alex From: John Garbutt j...@johngarbutt.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 14/11/2013 04:57 PM Subject:Re: [openstack-dev] [nova] Configure overcommit policy On 13 November 2013 14:51, Khanh-Toan Tran khanh-toan.t...@cloudwatt.com wrote: Well, I don't know what John means by modify the over-commit calculation in the scheduler, so I cannot comment. I was talking about this code: https://github.com/openstack/nova/blob/master/nova/scheduler/filters/core_filter.py#L64 But I am not sure thats what you want. The idea of choosing free host for Hadoop on the fly is rather complicated and contains several operations, namely: (1) assuring the host never get pass 100% CPU load; (2) identifying a host that already has a Hadoop VM running on it, or already 100% CPU commitment; (3) releasing the host from 100% CPU commitment once the Hadoop VM stops; (4) possibly avoiding other applications to use the host (to economy the host resource). - You'll need (1) because otherwise your Hadoop VM would come short of resources after the host gets overloaded. - You'll need (2) because you don't want to restrict a new host while one of your 100% CPU commited hosts still has free resources. - You'll need (3) because otherwise you host would be forerever restricted, and that is no longer on the fly. - You'll may need (4) because otherwise it'd be a waste of resources. The problem of changing CPU overcommit on the fly is that when your Hadoop VM is still running, someone else can add another VM in the same host with a higher CPU overcommit (e.g. 200%), (violating (1) ) thus effecting your Hadoop VM also. The idea of putting the host in the aggregate can give you (1) and (2). (4) is done by AggregateInstanceExtraSpecsFilter. However, it does not give you (3); which can be done with pCloud. Step 1: use flavors so nova can tell between the two workloads, and configure them differently Step 2: find capacity for your workload given your current cloud usage At the moment, most of our solutions involve reserving bits of your cloud capacity for different workloads, generally using host aggregates. The issue with claiming back capacity from other workloads is a bit tricker. The issue is I don't think you have defined where you get that capacity back from? Maybe you want to look at giving some workloads a higher priority over the constrained CPU resources? But you will probably starve the little people out at random, which seems bad. Maybe you want to have a concept of spot instances where they can use your spare capacity until you need it, and you can just kill them? But maybe I am miss understanding your use case, its not totally clear to me. John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Configure overcommit policy
Khanh-Toan Tran khanh-toan.t...@cloudwatt.com wrote on 14/11/2013 06:27:39 PM: It is interesting to see the development of the CPU entitlement blueprint that Alex mentioned. It was registered in Jan 2013. Any idea whether it is still going on? Yes. I hope we will be able to rebase and submit for review soon. Regards, Alex De : Alex Glikson [mailto:glik...@il.ibm.com] Envoyé : jeudi 14 novembre 2013 16:13 À : OpenStack Development Mailing List (not for usage questions) Objet : Re: [openstack-dev] [nova] Configure overcommit policy In fact, there is a blueprint which would enable supporting this scenario without partitioning -- https://blueprints.launchpad.net/ nova/+spec/cpu-entitlement The idea is to annotate flavors with CPU allocation guarantees, and enable differentiation between instances, potentially running on thesame host. The implementation is augmenting the CoreFilter code to factor in the differentiation. Hopefully this will be out for review soon. Regards, Alex From:John Garbutt j...@johngarbutt.com To:OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date:14/11/2013 04:57 PM Subject:Re: [openstack-dev] [nova] Configure overcommit policy On 13 November 2013 14:51, Khanh-Toan Tran khanh-toan.t...@cloudwatt.com wrote: Well, I don't know what John means by modify the over-commit calculation in the scheduler, so I cannot comment. I was talking about this code: https://github.com/openstack/nova/blob/master/nova/scheduler/ filters/core_filter.py#L64 But I am not sure thats what you want. The idea of choosing free host for Hadoop on the fly is rather complicated and contains several operations, namely: (1) assuring the host never get pass 100% CPU load; (2) identifying a host that already has a Hadoop VM running on it, or already 100% CPU commitment; (3) releasing the host from 100% CPU commitment once the Hadoop VM stops; (4) possibly avoiding other applications to use the host (to economy the host resource). - You'll need (1) because otherwise your Hadoop VM would come short of resources after the host gets overloaded. - You'll need (2) because you don't want to restrict a new host while one of your 100% CPU commited hosts still has free resources. - You'll need (3) because otherwise you host would be forerever restricted, and that is no longer on the fly. - You'll may need (4) because otherwise it'd be a waste of resources. The problem of changing CPU overcommit on the fly is that when your Hadoop VM is still running, someone else can add another VM in the same host with a higher CPU overcommit (e.g. 200%), (violating (1) ) thus effecting your Hadoop VM also. The idea of putting the host in the aggregate can give you (1) and (2). (4) is done by AggregateInstanceExtraSpecsFilter. However, it does not give you (3); which can be done with pCloud. Step 1: use flavors so nova can tell between the two workloads, and configure them differently Step 2: find capacity for your workload given your current cloud usage At the moment, most of our solutions involve reserving bits of your cloud capacity for different workloads, generally using host aggregates. The issue with claiming back capacity from other workloads is a bit tricker. The issue is I don't think you have defined where you get that capacity back from? Maybe you want to look at giving some workloads a higher priority over the constrained CPU resources? But you will probably starve the little people out at random, which seems bad. Maybe you want to have a concept of spot instances where they can use your spare capacity until you need it, and you can just kill them? But maybe I am miss understanding your use case, its not totally clear to me. John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2014.0.4158 / Base de données virale: 3629/6834 - Date: 13/11/2013 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Nova] [Ironic] scheduling flow with Ironic?
Hi, Is there a documentation somewhere on the scheduling flow with Ironic? The reason I am asking is because we would like to get virtualized and bare-metal workloads running in the same cloud (ideally with the ability to repurpose physical machines between bare-metal workloads and virtualized workloads), and would like to better understand where the gaps are (and potentially help bridging them). Thanks, Alex ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Configure overcommit policy
You can consider having a separate host aggregate for Hadoop, and use a combination of AggregateInstanceExtraSpecFilter (with a special flavor mapped to this host aggregate) and AggregateCoreFilter (overriding cpu_allocation_ratio for this host aggregate to be 1). Regards, Alex From: John Garbutt j...@johngarbutt.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 12/11/2013 04:41 PM Subject:Re: [openstack-dev] [nova] Configure overcommit policy On 11 November 2013 12:04, Alexander Kuznetsov akuznet...@mirantis.com wrote: Hi all, While studying Hadoop performance in a virtual environment, I found an interesting problem with Nova scheduling. In OpenStack cluster, we have overcommit policy, allowing to put on one compute more vms than resources available for them. While it might be suitable for general types of workload, this is definitely not the case for Hadoop clusters, which usually consume 100% of system resources. Is there any way to tell Nova to schedule specific instances (the ones which consume 100% of system resources) without overcommitting resources on compute node? You could have a flavor with no-overcommit extra spec, and modify the over-commit calculation in the scheduler on that case, but I don't remember seeing that in there. John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][cinder][oslo][scheduler] How to leverage oslo schduler/filters for nova and cinder
Hi Boris, This is a very interesting approach. How do you envision the life cycle of such a scheduler in terms of code repository, build, test, etc? What kind of changes to provisioning APIs do you envision to 'feed' such a scheduler? Any particular reason you didn't mention Neutron? Also, there are some interesting technical challenges (e.g., state management across potentially large number of instances of memcached). Thanks, Alex Boris Pavlovic bpavlo...@mirantis.com wrote on 10/11/2013 07:05:42 PM: From: Boris Pavlovic bpavlo...@mirantis.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 10/11/2013 07:07 PM Subject: Re: [openstack-dev] [nova][cinder][oslo][scheduler] How to leverage oslo schduler/filters for nova and cinder Jay, Hi Jay, yes we were working about putting all common stuff in oslo- scheduler. (not only filters) As a result of this work we understood, that this is wrong approach. Because it makes result code very complex and unclear. And actually we didn't find the way to put all common stuff inside oslo. Instead of trying to make life too complex we found better approach. Implement scheduler aaS that can scale (current solution has some scale issues) store all data from nova, cinder probably other places. To implement such approach we should change a bit current architecture: 1) Scheduler should store all his data (not nova.db cinder.db) 2) Scheduler should always have own snapshot of wold state, and sync it with another schedulers using something that is quite fast (e.g. memcached) 3) Merge schedulers rpc methods from nova cinder in one scheduler (it is possbile if we store all data from cinder nova in one sceduler). 4) Drop cinder, and nova tables that store host states (as we don't need them) We implemented already base start (mechanism that store snapshot of world state sync it between different schedulers): https://review.openstack.org/#/c/45867/ (it is still bit in WIP) Best regards, Boris Pavlovic --- Mirantis Inc. On Sun, Nov 10, 2013 at 1:59 PM, Jay Lau jay.lau@gmail.com wrote: I noticed that there is already a bp in oslo tracing what I want to do: https://blueprints.launchpad.net/oslo/+spec/oslo-scheduler Thanks, Jay 2013/11/9 Jay Lau jay.lau@gmail.com Greetings, Now in oslo, we already put some scheduler filters/weights logic there and cinder is using oslo scheduler filters/weights logic, seems we want both novacinder use this logic in future. Found some problems as following: 1) In cinder, some filters/weight logic reside in cinder/openstack/ common/scheduler and some filter/weight logic in cinder/scheduler, this is not consistent and also will make some cinder hackers confused: where shall I put the scheduler filter/weight. 2) Nova is not using filter/weight from oslo and also not using entry point to handle all filter/weight. 3) There is not enough filters in oslo, we may need to add more there: such as same host filter, different host filter, retry filter etc. So my proposal is as following: 1) Add more filters to oslo, such as same host filter, different host filter, retry filter etc. 2) Move all filters/weight logic in cinder from cinder/scheduler to cinder/openstack/common/scheduler 3) Enable nova use filter/weight logic from oslo (Move all filter logic to nova/openstack/common/scheduler) and also use entry point to handle all filters/weight logic. Comments? Thanks, Jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][scheduler] Instance Group Model and APIs - Updated document with an example request payload
Mike Spreitzer mspre...@us.ibm.com wrote on 30/10/2013 06:11:04 AM: Date: 30/10/2013 06:12 AM Alex also wrote: ``I wonder whether it is possible to find an approach that takes into account cross-resource placement considerations (VM-to-VM communicating over the application network, or VM-to-volume communicating over storage network), but does not require delivering all the intimate details of the entire environment to a single place -- which probably can not be either of Nova/Cinder/Neutron/etc.. but can we still use the individual schedulers in each of them with partial view of the environment to drive a placement decision which is consistently better than random?'' I think you could create a cross-scheduler protocol that would accomplish joint placement decision making --- but would not want to. It would involve a lot of communication, and the subject matter of that communication would be most of what you need in a centralized placement solver anyway. You do not need all the intimate details, just the bits that are essential to making the placement decision. Amount of communication depends on the protocol, and what exactly needs to be shared.. Maybe there is a range of options here that we can potentially explore, between what exists today (Heat talking to each of the components, retrieving local information about availability zones, flavors and volume types, existing resources, etc, and communicates back with scheduler hints), and having a centralized DB that keeps the entire data model. Also, maybe different points on the continuum between 'share few' and 'share a lot' would be a good match for different kinds of environments and different kinds of workload mix (for example, as you pointed out, in an environment with flat network and centralized storage, the sharing can be rather minimal). Alex Glikson asked why not go directly to holistic if there is no value in doing Nova-only. Yathi replied to that concern, and let me add some notes. I think there *are* scenarios in which doing Nova- only joint policy-based scheduling is advantageous. Great, I am not trying to claim that such scenarios do not exist - I am just saying that it is important to spell them out, to better understand the trade-off between the benefit and the complexity, and to make sure out design is flexible enough to accommodate the high-priority ones, and extensible enough to accommodate the rest going forward. Regards, Alex ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [OpenStack-dev][Nova][Discussion]Blueprint : Auto VM Discovery in OpenStack for existing workload
Maybe a more appropriate approach could be to have a tool/script that does it, as a one time thing. For example, it could make sense in a scenario when Nova DB gets lost or corrupted, a new Nova controller is deployed, and the DB needs to be recreated. Potentially, since Nova DB is primarily a cache, this could be done by 'discovery' (maybe with some manual intervention) - instead of dealing with backup/restore of the DB, or similar approaches. Regards, Alex From: Russell Bryant rbry...@redhat.com To: openstack-dev@lists.openstack.org, Date: 30/10/2013 08:52 AM Subject:Re: [openstack-dev] [OpenStack-dev][Nova][Discussion]Blueprint : Auto VM Discovery in OpenStack for existing workload On 10/30/2013 02:36 AM, Swapnil Kulkarni wrote: I had a discussion with russellb regarding this for yesterday, I would like to discuss this with the team regarding the blueprint mentioned in subject. https://blueprints.launchpad.net/nova/+spec/auto-vm-discovery-on-hypervisor Description: Organizations opting to use openstack can have varied amount of workload that they would like to be available directly with the use of some discovery workflows. One common usage of this would be exising virtual machines present on the hypervisors. If this instances can be disovered by the compute agent during discovery, it would help to use Openstack to manage the existing workload directly. Auto VM Discovery will enable this functionality initially for KVM guests, the widely used hypervisor configuration in OpenStack deployments and enhance it further for other hypervisors. I feel that Nova managing VMs that it didn't create is not an appropriate use case to support. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [OpenStack-dev][Nova][Discussion]Blueprint : Auto VM Discovery in OpenStack for existing workload
Russell Bryant rbry...@redhat.com wrote on 30/10/2013 10:20:34 AM: On 10/30/2013 03:13 AM, Alex Glikson wrote: Maybe a more appropriate approach could be to have a tool/script that does it, as a one time thing. For example, it could make sense in a scenario when Nova DB gets lost or corrupted, a new Nova controller is deployed, and the DB needs to be recreated. Potentially, since Nova DB is primarily a cache, this could be done by 'discovery' (maybe with some manual intervention) - instead of dealing with backup/restore of the DB, or similar approaches. The need for this sort of thing makes more sense for traditional datacenter virtualization, but not as much for cloud. That's the root of my objection. Not sure I understand why.. Do you assume that the cloud necessarily runs stateless workloads, so that loosing VM instances is not an issue? (what about volumes that could've been attached to them?) Or do you assume that cloud is necessarily deployed in HA configuration that never gets broken? Alex ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey
There is a ZK-backed driver in Nova service heartbeat mechanism ( https://blueprints.launchpad.net/nova/+spec/zk-service-heartbeat) -- would be interesting to know whether it is widely used (might be worth asking at the general ML, or user groups). There have been also discussions on using it for other purposes (some listed towards the bottom at https://wiki.openstack.org/wiki/NovaZooKeeperHeartbeat). While I am not aware of any particular progress with implementing any of them, I think they still make sense and could be useful. Regards, Alex From: Clint Byrum cl...@fewbar.com To: openstack-dev openstack-dev@lists.openstack.org, Date: 30/10/2013 07:45 PM Subject:[openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey So, recently we've had quite a long thread in gerrit regarding locking in Heat: https://review.openstack.org/#/c/49440/ In the patch, there are two distributed lock drivers. One uses SQL, and suffers from all the problems you might imagine a SQL based locking system would. It is extremely hard to detect dead lock holders, so we end up with really long timeouts. The other is ZooKeeper. I'm on record as saying we're not using ZooKeeper. It is a little embarrassing to have taken such a position without really thinking things through. The main reason I feel this way though, is not because ZooKeeper wouldn't work for locking, but because I think locking is a mistake. The current multi-engine paradigm has a race condition. If you have a stack action going on, the state is held in the engine itself, and not in the database, so if another engine starts working on another action, they will conflict. The locking paradigm is meant to prevent this. But I think this is a huge mistake. The engine should store _all_ of its state in a distributed data store of some kind. Any engine should be aware of what is already happening with the stack from this state and act accordingly. That includes the engine currently working on actions. When viewed through this lense, to me, locking is a poor excuse for serializing the state of the engine scheduler. It feels like TaskFlow is the answer, with an eye for making sure TaskFlow can be made to work with distributed state. I am not well versed on TaskFlow's details though, so I may be wrong. It worries me that TaskFlow has existed a while and doesn't seem to be solving real problems, but maybe I'm wrong and it is actually in use already. Anyway, as a band-aid, we may _have_ to do locking. For that, ZooKeeper has some real advantages over using the database. But there is hesitance because it is not widely supported in OpenStack. What say you, OpenStack community? Should we keep ZooKeeper out of our.. zoo? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][scheduler] Instance Group Model and APIs - Updated document with an example request payload
Andrew Laski andrew.la...@rackspace.com wrote on 29/10/2013 11:14:03 PM: [...] Having Nova call into Heat is backwards IMO. If there are specific pieces of information that Nova can expose, or API capabilities to help with orchestration/placement that Heat or some other service would like to use then let's look at that. Nova has placement concerns that extend to finding a capable hypervisor for the VM that someone would like to boot, and then just slightly beyond. +1 If there are higher level decisions to be made about placement decisions I think that belongs outside of Nova, and then just tell Nova where to put it. I wonder whether it is possible to find an approach that takes into account cross-resource placement considerations (VM-to-VM communicating over the application network, or VM-to-volume communicating over storage network), but does not require delivering all the intimate details of the entire environment to a single place -- which probably can not be either of Nova/Cinder/Neutron/etc.. but can we still use the individual schedulers in each of them with partial view of the environment to drive a placement decision which is consistently better than random? Regards, Alex ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][scheduler] Instance Group Model and APIs - Updated document with an example request payload
If we envision the main benefits only after (parts of) this logic moves outside of Nova (and starts addressing other resources) -- would it be still worth maintaining an order of 5K LOC in Nova to support this feature? Why not going for the 'ultimate' solution in the first place then, keeping in Nova only the mandatory enablement (TBD)? Alternatively, if we think that there is value in having this just in Nova -- would be good to understand the exact scenarios which do not require awareness of other resources (and see if they are important enough to maintain those 5K LOC), and how exactly this can gradually evolve into the 'ultimate' solution. Or am I missing something? Alex From: Yathiraj Udupi (yudupi) yud...@cisco.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org, Date: 29/10/2013 11:46 PM Subject:Re: [openstack-dev] [nova][scheduler] Instance Group Model and APIs - Updated document with an example request payload Thanks Alex, Mike, Andrew, Russel for your comments. This ongoing API discussion started in our scheduler meetings, as a first step to tackle in the Smarter resource placement ideas - See the doc for reference - https://docs.google.com/document/d/1IiPI0sfaWb1bdYiMWzAAx0HYR6UqzOan_Utgml5W1HI/edit This roadmap calls for a unified resource placement decisions to be taken covering resources across services, starting from a complete topology request with all the necessary nodes/instances/resources, their connections, and the policies. However we agreed that we will first address the defining of the required APIs, and start the effort to make this happen within Nova, using VM instances groups, with policies. Hence this proposal for the instance groups. The entire group needs to be placed as a whole, at least the first step is to find an ideal placement choices for the entire group. Once the placement has been identified (using a smart resource placement engine that addresses solving the entire group), we then focus on ways to schedule them as a whole. This is not part of the API discussion, however important for the smart resource placement ideas. This definitely involves concepts such as reservation, etc. Heat or Heat APIs could be a choice to enable the final orchestration, but I am not commenting on that here. The APIs effort here is an attempt to provide clean interfaces now to be able to represent this instance group, and save them, and also define apis to create them. The actual implementation will have to rely on one or more services to - 1. to make the resource placement decisions, 2. then actually provision them, orchestrate them in the right order, etc. The placement decisions itself can happen in a module that can be a separate service, and can be reused by different services, and it also needs to have a global vision of all the resources. (Again all of this part of the scope of smart resource placement topic). Thanks, Yathi. On 10/29/13, 2:14 PM, Andrew Laski andrew.la...@rackspace.com wrote: On 10/29/13 at 04:05pm, Mike Spreitzer wrote: Alex Glikson glik...@il.ibm.com wrote on 10/29/2013 03:37:41 AM: 1. I assume that the motivation for rack-level anti-affinity is to survive a rack failure. Is this indeed the case? This is a very interesting and important scenario, but I am curious about your assumptions regarding all the other OpenStack resources and services in this respect. Remember we are just starting on the roadmap. Nova in Icehouse, holistic later 2. What exactly do you mean by network reachibility between the two groups? Remember that we are in Nova (at least for now), so we don't have much visibility to the topology of the physical or virtual networks. Do you have some concrete thoughts on how such policy can be enforced, in presence of potentially complex environment managed by Neutron? I am aiming for the holistic future, and Yathi copied that from an example I drew with the holistic future in mind. While we are only addressing Nova, I think a network reachability policy is inapproprite. 3. The JSON somewhat reminds me the interface of Heat, and I would assume that certain capabilities that would be required to implement it would be similar too. What is the proposed approach to 'harmonize' between the two, in environments that include Heat? What would be end-to-end flow? For example, who would do the orchestration of individual provisioning steps? Would create operation delegate back to Heat for that? Also, how other relationships managed by Heat (e.g., links to storage and network) would be incorporated in such an end-to-end scenario? You raised a few interesting issues. 1. Heat already has a way to specify resources, I do not see why we should invent another. 2. Should Nova call Heat to do the orchestration? I would like to see an example where ordering is an issue. IMHO, since OpenStack already has a solution for creating
Re: [openstack-dev] [nova] Thoughs please on how to address a problem with mutliple deletes leading to a nova-compute thread pool problem
+1 Regards, Alex Joshua Harlow harlo...@yahoo-inc.com wrote on 26/10/2013 09:29:03 AM: An idea that others and I are having for a similar use case in cinder (or it appears to be similar). If there was a well defined state machine/s in nova with well defined and managed transitions between states then it seems like this state machine could resume on failure as well as be interrupted when a dueling or preemptable operation arrives (a delete while being created for example). This way not only would it be very clear the set of states and transitions but it would also be clear how preemption occurs (and under what cases). Right now in nova there is a distributed and ad-hoc state machine which if it was more formalized it could inherit some if the described useful capabilities. It would also be much more resilient to these types of locking problems that u described. IMHO that's the only way these types of problems will be fully be fixed, not by more queues or more periodic tasks, but by solidifying formalizing the state machines that compose the work nova does. Sent from my really tiny device... On Oct 25, 2013, at 3:52 AM, Day, Phil philip@hp.com wrote: Hi Folks, We're very occasionally seeing problems where a thread processing a create hangs (and we've seen when taking to Cinder and Glance). Whilst those issues need to be hunted down in their own rights, they do show up what seems to me to be a weakness in the processing of delete requests that I'd like to get some feedback on. Delete is the one operation that is allowed regardless of the Instance state (since it's a one-way operation, and users should always be able to free up their quota). However when we get a create thread hung in one of these states, the delete requests when they hit the manager will also block as they are synchronized on the uuid. Because the user making the delete request doesn't see anything happen they tend to submit more delete requests. The Service is still up, so these go to the computer manager as well, and eventually all of the threads will be waiting for the lock, and the compute manager will stop consuming new messages. The problem isn't limited to deletes - although in most cases the change of state in the API means that you have to keep making different calls to get past the state checker logic to do it with an instance stuck in another state. Users also seem to be more impatient with deletes, as they are trying to free up quota for other things. So while I know that we should never get a thread into a hung state into the first place, I was wondering about one of the following approaches to address just the delete case: i) Change the delete call on the manager so it doesn't wait for the uuid lock. Deletes should be coded so that they work regardless of the state of the VM, and other actions should be able to cope with a delete being performed from under them. There is of course no guarantee that the delete itself won't block as well. ii) Record in the API server that a delete has been started (maybe enough to use the task state being set to DELETEING in the API if we're sure this doesn't get cleared), and add a periodic task in the compute manager to check for and delete instances that are in a DELETING state for more than some timeout. Then the API, knowing that the delete will be processes eventually can just no-op any further delete requests. iii) Add some hook into the ServiceGroup API so that the timer could depend on getting a free thread from the compute manager pool (ie run some no-op task) - so that of there are no free threads then the service becomes down. That would (eventually) stop the scheduler from sending new requests to it, and make deleted be processed in the API server but won't of course help with commands for other instances on the same host. iv) Move away from having a general topic and thread pool for all requests, and start a listener on an instance specific topic for each running instance on a host (leaving the general topic and pool just for creates and other non-instance calls like the hypervisor API). Then a blocked task would only affect request for a specificinstance. I'm tending towards ii) as a simple and pragmatic solution in the near term, although I like both iii) and iv) as being both generally good enhancments - but iv) in particular feels like a pretty seismic change. Thoughts please, Phil ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Towards OpenStack Disaster Recovery
Hi Caitlin, Caitlin Bestler caitlin.best...@nexenta.com wrote on 21/10/2013 06:51:36 PM: On 10/21/2013 2:34 AM, Avishay Traeger wrote: Hi all, We (IBM and Red Hat) have begun discussions on enabling Disaster Recovery (DR) in OpenStack. We have created a wiki page with our initial thoughts: https://wiki.openstack.org/wiki/DisasterRecovery We encourage others to contribute to this wiki. What wasn't clear to me on first read is what the intended scope is. Exactly what is being failed over? An entire multi-tenant data-center? Specific tenants? Or specific enumerated sets of VMs for one tenant? Our assumption is that an entire DC is failing, while only (potentially small) subset of VMs/etc need to be protected/recovered. Regards, Alex ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][scheduler] A new blueprint for Nova-scheduler: Policy-based Scheduler
This sounds very similar to https://blueprints.launchpad.net/nova/+spec/multiple-scheduler-drivers We worked on it in Havana, learned a lot from feedbacks during the review cycle, and hopefully will finalize the details at the summit and will be able to continue finish the implementation in Icehouse. Would be great to collaborate. Regards, Alex From: Khanh-Toan Tran khanh-toan.t...@cloudwatt.com To: openstack-dev@lists.openstack.org, Date: 16/10/2013 01:42 PM Subject:[openstack-dev] [nova][scheduler] A new blueprint for Nova-scheduler: Policy-based Scheduler Dear all, I've registered a new blueprint for nova-scheduler. The purpose of the blueprint is to propose a new scheduler that is based on policy: https://blueprints.launchpad.net/nova/+spec/policy-based-scheduler With current Filter_Scheduler, admin cannot change his placement policy without restarting nova-scheduler. Neither can he define local policy for a group of resources (say, an aggregate), or a particular client. Thus we propose this scheduler to provide admin with the capability of defining/changing his placement policy in runtime. The placement policy can be global (concerning all resources), local (concerning a group of resources), or tenant-specific. Please don't hesitate to contact us for discussion, all your comments are welcomed! Best regards, Khanh-Toan TRAN Cloudwatt Email: khanh-toan.tran[at]cloudwatt.com 892 Rue Yves Kermen 92100 BOULOGNE-BILLANCOURT FRANCE ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Scheduler meeting and Icehouse Summit
IMO, the three themes make sense, but I would suggest waiting until the submission deadline and discuss at the following IRC meeting on the 22nd. Maybe there will be more relevant proposals to consider. Regards, Alex P.S. I plan to submit a proposal regarding scheduling policies, and maybe one more related to theme #1 below From: Day, Phil philip@hp.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 14/10/2013 06:50 PM Subject:Re: [openstack-dev] Scheduler meeting and Icehouse Summit Hi Folks, In the weekly scheduler meeting we've been trying to pull together a consolidated list of Summit sessions so that we can find logical groupings and make a more structured set of sessions for the limited time available at the summit. https://etherpad.openstack.org/p/IceHouse-Nova-Scheduler-Sessions With the deadline for sessions being this Thursday 17th, tomorrows IRC meeting is the last chance to decide which sessions we want to combine / prioritize.Russell has indicated that a starting assumption of three scheduler sessions is reasonable, with any extras depending on what else is submitted. I've matched the list on the Either pad to submitted sessions below, and added links to any other proposed sessions that look like they are related. 1) Instance Group Model and API Session Proposal: http://summit.openstack.org/cfp/details/190 2) Smart Resource Placement: Session Proposal: http://summit.openstack.org/cfp/details/33 Possibly related sessions: Resource optimization service for nova ( http://summit.openstack.org/cfp/details/201) 3) Heat and Scheduling and Software, Oh My!: Session Proposal: http://summit.openstack.org/cfp/details/113 4) Generic Scheduler Metrics and Celiometer: Session Proposal: http://summit.openstack.org/cfp/details/218 Possibly related sessions: Making Ceilometer and Nova play nice http://summit.openstack.org/cfp/details/73 5) Image Properties and Host Capabilities Session Proposal: NONE 6) Scheduler Performance: Session Proposal: NONE Possibly related Sessions: Rethinking Scheduler Design http://summit.openstack.org/cfp/details/34 7) Scheduling Across Services: Session Proposal: NONE 8) Private Clouds: Session Proposal: http://summit.openstack.org/cfp/details/228 9) Multiple Scheduler Policies: Session Proposal: NONE The proposal from last weeks meeting was to use the three slots for: - Instance Group Model and API (1) - Smart Resource Placement (2) - Performance (6) However, at the moment there doesn't seem to be a session proposed to cover the performance work ? It also seems to me that the Group Model and Smart Placement are pretty closely linked along with (3) (which says it wants to combine 1 2 into the same topic) , so if we only have three slots available then these look like logical candidates for consolidating into a single session.That would free up a session to cover the generic metrics (4) and Ceilometer - where a lot of work in Havana stalled because we couldn't get a consensus on the way forward. The third slot would be kept for performance - which based on the lively debate in the scheduler meetings I'm assuming will still be submitted as a session.Private Clouds isn't really a scheduler topic, so I suggest it takes its chances as a general session. Hence my revised proposal for the three slots is: i) Group Scheduling / Smart Placement / Heat and Scheduling (1), (2), (3), (7) - How do you schedule something more complex that a single VM ? ii) Generalized scheduling metrics / celiometer integration (4) - How do we extend the set of resources a scheduler can use to make its decisions ? - How do we make this work with / compatible with Celiometer ? iii) Scheduler Performance (6) In that way we will at least give airtime to all of the topics. If a 4th scheduler slot becomes available then we could break up the first session into two parts. Thoughts welcome here or in tomorrows IRC meeting. Cheers, Phil ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] Policy Model
I would suggest not to generalize too much.. e.g., restrict the discussion to PlacementPolicy. If anyone else would want to use a similar construct for other purposes -- it can be generalized later. For example, the notion of 'policy' already exists in other places in OpenStack in the context of security, and we also plan to introduce a different kind of 'policies' for scheduler configurations in different managed domains (e.g., aggregates), but I wonder whether it is important (or makes sense) making all of them 'inherit' from the same base model. Regards, Alex From: Yathiraj Udupi (yudupi) yud...@cisco.com To: Mike Spreitzer mspre...@us.ibm.com, Cc: OpenStack Development Mailing List openstack-dev@lists.openstack.org Date: 15/10/2013 07:33 AM Subject:Re: [openstack-dev] [scheduler] Policy Model The Policy model object has a lifecycle of its own. This is because this policy object can possibly be used outside the scope of the InstanceGroup effort. Hence I don't see a problem in a policy administrator, or any user, if allowed, to maintain this set of policies outside the scope of InstanceGroups. However a group author will maintain the InstanceGroupPolicy objects, and refer a policy that is appropriate to his requirement. If a new Policy object needs to be registered for a new requirement, that has to be done by this user, if allowed. About your question regarding dangling references, that situation should not be allowed, hence a delete of the Policy object should not be allowed, if there is some other object referring it. This can be implemented right, by adding a proper association between the models. This way, a generic Policy model can apply to other scenarios that may come up in Openstack. Regards, Yathi. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft
Thanks for the pointer -- was not able to attend that meeting, unfortunately. Couple of observations, based on what I've heard till now. 1. I think it is important not to restrict the discussion to Nova resources. So, I like the general direction in [1] to target a generic mechanism and API. However, once we start following that path, it becomes more challenging to figure out which component should manage those cross-resource constructs (Heat sounds like a reasonable candidate -- which seems consistent with the proposal at [2]), and what should be the API between it and the services deciding on the actual placement of individual resources (nova, cinder, neutron). 2. Moreover, we should take into account that we may need to take into consideration multiple sources of topology -- physical (maybe provided by Ironic, affecting availability -- hosts, racks, etc), virtual-compute (provided by Nova, affecting resource isolation -- mainly hosts), virtual-network (affecting connectivity and bandwidth/latency.. think of SDN policies enforcing routing and QoS almost orthogonally to physical topology), virtual-storage (affecting VM-to-volume connectivity and bandwidth/latency.. think of FC network implying topology different than the physical one and the IP network one). I wonder whether we will be able to come up with a simple-enough initial approach implementation, which would not limit the ability to extend customize it going forward to cover all the above. Regards, Alex [1] https://docs.google.com/document/d/17OIiBoIavih-1y4zzK0oXyI66529f-7JTCVj-BcXURA/edit [2] https://wiki.openstack.org/wiki/Heat/PolicyExtension Alex Glikson Manager, Cloud Operating System Technologies, IBM Haifa Research Lab http://w3.haifa.ibm.com/dept/stt/cloud_sys.html | http://www.research.ibm.com/haifa/dept/stt/cloud_sys.shtml Email: glik...@il.ibm.com | Phone: +972-4-8281085 | Mobile: +972-54-647 | Fax: +972-4-8296112 From: Mike Spreitzer mspre...@us.ibm.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/10/2013 07:59 AM Subject:Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft Yes, there is more than the northbound API to discuss. Gary started us there in the Scheduler chat on Oct 1, when he broke the issues down like this: 11:12:22 AM garyk: 1. a user facing API 11:12:41 AM garyk: 2. understanding which resources need to be tracked 11:12:48 AM garyk: 3. backend implementation The full transcript is at http://eavesdrop.openstack.org/meetings/scheduling/2013/scheduling.2013-10-01-15.08.log.html Alex Glikson glik...@il.ibm.com wrote on 10/09/2013 02:14:03 AM: Good summary. I would also add that in A1 the schedulers (e.g., in Nova and Cinder) could talk to each other to coordinate. Besides defining the policy, and the user-facing APIs, I think we should also outline those cross-component APIs (need to think whether they have to be user-visible, or can be admin). Regards, Alex ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft
Good summary. I would also add that in A1 the schedulers (e.g., in Nova and Cinder) could talk to each other to coordinate. Besides defining the policy, and the user-facing APIs, I think we should also outline those cross-component APIs (need to think whether they have to be user-visible, or can be admin). Regards, Alex From: Mike Spreitzer mspre...@us.ibm.com To: Yathiraj Udupi (yudupi) yud...@cisco.com, Cc: OpenStack Development Mailing List openstack-dev@lists.openstack.org Date: 09/10/2013 08:46 AM Subject:Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft Thanks for the clue about where the request/response bodies are documented. Is there any convenient way to view built documentation for Havana right now? You speak repeatedly of the desire for clean interfaces, and nobody could disagree with such words. I characterize my desire that way too. It might help me if you elaborate a little on what clean means to you. To me it is about minimizing the number of interactions between different modules/agents and the amount of information in those interactions. In short, it is about making narrow interfaces - a form of simplicity. To me the most frustrating aspect of this challenge is the need for the client to directly mediate the dependencies between resources; this is really what is driving us to do ugly things. As I mentioned before, I am coming from a setting that does not have this problem. So I am thinking about two alternatives: (A1) how clean can we make a system in which the client continues to directly mediate dependencies between resources, and (A2) how easily and cleanly can we make that problem go away. For A1, we need the client to make a distinct activation call for each resource. You have said that we should start the roadmap without joint scheduling; in this case, the scheduling can continue to be done independently for each resource and can be bundled with the activation call. That can be the call we know and love today, the one that creates a resource, except that it needs to be augmented to also carry some pointer that points into the policy data so that the relevant policy data can be taken into account when making the scheduling decision. Ergo, the client needs to know this pointer value for each resource. The simplest approach would be to let that pointer be the combination of (p1) a VRT's UUID and (p2) the local name for the resource within the VRT. Other alternatives are possible, but require more bookkeeping by the client. I think that at the first step of the roadmap for A1, the client/service interaction for CREATE can be in just two phases. In the first phase the client presents a topology (top-level InstanceGroup in your terminology), including resource definitions, to the new API for registration; the response is a UUID for that registered top-level group. In the second phase the client creates the resources as is done today, except that each creation call is augmented to carry the aforementioned pointer into the policy information. Each resource scheduler (just nova, at first) can use that pointer to access the relevant policy information and take it into account when scheduling. The client/service interaction for UPDATE would be in the same two phases: first update the policyresource definitions at the new API, then do the individual resource updates in dependency order. I suppose the second step in the roadmap is to have Nova do joint scheduling. The client/service interaction pattern can stay the same. The only difference is that Nova makes the scheduling decisions in the first phase rather than the second. But that is not a detail exposed to the clients. Maybe the third step is to generalize beyond nova? For A2, the first question is how to remove user-level create-time dependencies between resources. We are only concerned with the user-level create-time dependencies here because it is only they that drive intimate client interactions. There are also create-time dependencies due to the nature of the resource APIs; for example, you can not attach a volume to a VM until after both have been created. But handling those kinds of create-time dependencies does not require intimate interactions with the client. I know of two software orchestration technologies developed in IBM, and both have the property that there are no user-level create-time dependencies between resources; rather, the startup code (userdata) that each VM runs handles dependencies (using a library for cross-VM communication and synchronization). This can even be done in plain CFN, using wait conditions and handles (albeit somewhat clunkily), right? So I think there are ways to get this nice property already. The next question is how best to exploit it to make cleaner APIs. I think we can have a one-step
Re: [openstack-dev] [nova] automatically evacuate instances on compute failure
Seems that this can be broken into 3 incremental pieces. First, would be great if the ability to schedule a single 'evacuate' would be finally merged ( https://blueprints.launchpad.net/nova/+spec/find-host-and-evacuate-instance ). Then, it would make sense to have the logic that evacuates an entire host ( https://blueprints.launchpad.net/python-novaclient/+spec/find-and-evacuate-host ). The reasoning behind suggesting that this should not necessarily be in Nova is, perhaps, that it *can* be implemented outside Nova using the indvidual 'evacuate' API. Finally, it should be possible to close the loop and invoke the evacuation automatically as a result of a failure detection (not clear how exactly this would work, though). Hopefully we will have at least the first part merged soon (not sure if anyone is actively working on a rebase). Regards, Alex From: Syed Armani dce3...@gmail.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 09/10/2013 12:04 AM Subject:Re: [openstack-dev] [nova] automatically evacuate instances on compute failure Hi Folks, I am also very much curious about this. Earlier this bp had a dependency on query scheduler, which is now merged. It will be very helpful if anyone can throw some light on the fate of this bp. Thanks. Cheers, Syed Armani On Wed, Sep 25, 2013 at 11:46 PM, Chris Friesen chris.frie...@windriver.com wrote: I'm interested in automatically evacuating instances in the case of a failed compute node. I found the following blueprint that covers exactly this case: https://blueprints.launchpad.net/nova/+spec/evacuate-instance-automatically However, the comments there seem to indicate that the code that orchestrates the evacuation shouldn't go into nova (referencing the Havana design summit). Why wouldn't this type of behaviour belong in nova? (Is there a summary of discussions at the summit?) Is there a recommended place where this sort of thing should go? Thanks, Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [scheduler] blueprint for host/hypervisor location information
Mike Spreitzer mspre...@us.ibm.com wrote on 01/10/2013 06:58:10 AM: Alex Glikson glik...@il.ibm.com wrote on 09/29/2013 03:30:35 PM: Mike Spreitzer mspre...@us.ibm.com wrote on 29/09/2013 08:02:00 PM: Another reason to prefer host is that we have other resources to locate besides compute. Good point. Another approach (not necessarily contradicting) could be to specify the location as a property of host aggregate rather than individual hosts (and introduce similar notion in Cinder, and maybe Neutron). This could be an evolution/generalization of the existing 'availability zone' attribute, which would specify a more fine-grained location path (e.g., 'az_A:rack_R1:chassis_C2:node_N3'). We briefly discussed this approach at the previous summit (see 'simple implementation' under https://etherpad.openstack.org/HavanaTopologyAwarePlacement) -- but unfortunately I don't think we made much progress with the actual implementation in Havana (would be good to fix this in Icehouse). Thanks for the background. I can still see the etherpad, but the old summit proposal to which it points is gone. The proposal didn't have much details -- the main tool used at summit sessions is the etherpad. The etherpad proposes an API, and leaves open the question of whether it backs onto a common service. I think that is a key question. In my own group's work, this sort of information is maintained in a shared database. I'm not sure what is the right approach for OpenStack. IMO, it does make sense to have a service which maintains the physical topology. Tuskar sounds like a good candidate. Then it can 'feed' Nova/Cinder/Neutron with the relevant aggregation entities (like host aggregates in Nova) and their attributes, to be used by the scheduler within each of them. Alternatively, this could be done by the administrator (manually, or using other tools). Regards, Alex Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [scheduler] blueprint for host/hypervisor location information
Mike Spreitzer mspre...@us.ibm.com wrote on 29/09/2013 08:02:00 PM: Another reason to prefer host is that we have other resources to locate besides compute. Good point. Another approach (not necessarily contradicting) could be to specify the location as a property of host aggregate rather than individual hosts (and introduce similar notion in Cinder, and maybe Neutron). This could be an evolution/generalization of the existing 'availability zone' attribute, which would specify a more fine-grained location path (e.g., 'az_A:rack_R1:chassis_C2:node_N3'). We briefly discussed this approach at the previous summit (see 'simple implementation' under https://etherpad.openstack.org/HavanaTopologyAwarePlacement) -- but unfortunately I don't think we made much progress with the actual implementation in Havana (would be good to fix this in Icehouse). Regards, Alex ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Questions related to live migration without target host
I tend to agree with Jake that this check is likely to conflict with the scheduler, and should be removed. Regards, Alex From: Guangya Liu j...@unitedstack.com To: openstack-dev@lists.openstack.org, Date: 03/09/2013 02:03 AM Subject:[openstack-dev] Questions related to live migration without target host Greetings, There is an issue related to live migration without target host might want to get more discussion/feedback from you experts: https://bugs.launchpad.net/nova/+bug/1214943. I have proposed a fix for this issue ( https://review.openstack.org/#/c/43213/), the fix was directly remove the checking for free ram and always trust the result from nova scheduler as nova scheduler already select a best host for live migration based on the function filter_scheduler.py:select_hosts. Please show your comments if any, you can also directly append your comments to https://review.openstack.org/#/c/43213/. Thanks, Jake Liu UnitedStack Inc ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] multiple-scheduler-drivers blueprint
It seems that the main concern was that the overridden scheduler properties are taken from the flavor, and not from the aggregate. In fact, there was a consensus that this is not optimal. I think that we can still make some progress in Havana towards per-aggregate overrides, generalizing on the recently merged changes that do just that -- for cpu and for memory with FilterScheduler (and leveraging a bit from the original multi-sched patch). As follows: 1. individual filters will call get_config('abc') instead of CONF.abc (already implemented in the current version of the multi-sched patch, e.g., https://review.openstack.org/#/c/37407/30/nova/scheduler/filters/io_ops_filter.py ) 2. get_config() will check whether abc is defined in the aggregate, and if so will return the value from the aggregate, and CONF.abc otherwise (already implemented in recently merged AggregateCoreFilter and AggregateRamFilter -- e.g., https://review.openstack.org/#/c/33949/2/nova/scheduler/filters/core_filter.py ). 3. add a global flag that would enable or disable aggregate-based overrides This seems to be a relatively simple rafactoring of existing code, still achieving important portion of the original goals of this blueprint. Of course, we should still discuss the longer-term plan around scheduling policies at the summit. Thoughts? Regards, Alex From: Russell Bryant rbry...@redhat.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 27/08/2013 10:48 PM Subject:[openstack-dev] [Nova] multiple-scheduler-drivers blueprint Greetings, One of the important things to strive for in our community is consensus. When there's not consensus, we should take a step back and see if we need to change directions. There has been a lot of iterating on this feature, and I'm afraid we still don't have consensus around the design. Phil Day has been posting some really good feedback on the review. I asked Joe Gordon to take a look and provide another opinion. He agreed with Phil that we really need to have scheduler policies be a first class API citizen. So, that pushes this feature out to Icehouse, as it doesn't seem possible to get this done in the required timeframe for Havana. If you'd really like to push to get this into Havana, please make your case. :-) Thanks, -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] multiple-scheduler-drivers blueprint
Why can't something like this be done with just different filters, see such as for AggregateRamFilter? Well, first, at the moment each of these filters today duplicate the code that handles aggregate-based overrides. So, it would make sense to have it in one place anyway. Second, why duplicating all the filters if this can be done with a single flag? Regards, Alex From: Joe Gordon joe.gord...@gmail.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 28/08/2013 09:32 PM Subject:Re: [openstack-dev] [Nova] multiple-scheduler-drivers blueprint On Wed, Aug 28, 2013 at 9:12 AM, Alex Glikson glik...@il.ibm.com wrote: It seems that the main concern was that the overridden scheduler properties are taken from the flavor, and not from the aggregate. In fact, there was a consensus that this is not optimal. I think that we can still make some progress in Havana towards per-aggregate overrides, generalizing on the recently merged changes that do just that -- for cpu and for memory with FilterScheduler (and leveraging a bit from the original multi-sched patch). As follows: 1. individual filters will call get_config('abc') instead of CONF.abc (already implemented in the current version of the multi-sched patch, e.g., https://review.openstack.org/#/c/37407/30/nova/scheduler/filters/io_ops_filter.py ) 2. get_config() will check whether abc is defined in the aggregate, and if so will return the value from the aggregate, and CONF.abc otherwise (already implemented in recently merged AggregateCoreFilter and AggregateRamFilter -- e.g., https://review.openstack.org/#/c/33949/2/nova/scheduler/filters/core_filter.py ). 3. add a global flag that would enable or disable aggregate-based overrides Why can't something like this be done with just different filters, see such as for AggregateRamFilter? This seems to be a relatively simple rafactoring of existing code, still achieving important portion of the original goals of this blueprint. Of course, we should still discuss the longer-term plan around scheduling policies at the summit. Thoughts? Regards, Alex From:Russell Bryant rbry...@redhat.com To:OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date:27/08/2013 10:48 PM Subject:[openstack-dev] [Nova] multiple-scheduler-drivers blueprint Greetings, One of the important things to strive for in our community is consensus. When there's not consensus, we should take a step back and see if we need to change directions. There has been a lot of iterating on this feature, and I'm afraid we still don't have consensus around the design. Phil Day has been posting some really good feedback on the review. I asked Joe Gordon to take a look and provide another opinion. He agreed with Phil that we really need to have scheduler policies be a first class API citizen. So, that pushes this feature out to Icehouse, as it doesn't seem possible to get this done in the required timeframe for Havana. If you'd really like to push to get this into Havana, please make your case. :-) Thanks, -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] multiple-scheduler-drivers blueprint
Joe Gordon joe.gord...@gmail.com wrote on 28/08/2013 11:04:45 PM: Well, first, at the moment each of these filters today duplicate the code that handles aggregate-based overrides. So, it would make sense to have it in one place anyway. Second, why duplicating all the filters if this can be done with a single flag? We already have too many flags, and i don't want to introduce one that we plan on removing / deprecating in the near future if we can help it. Wouldn't it make sense to have a flag that enables/disables aggregate-based policy overrides anyway? https://github.com/openstack/nova/blob/master/nova/scheduler/ filters/ram_filter.py doesn't duplicate all the code, it uses a base class. The check the aggregate for the value logic is duplicated, but that is easy to fix. Yep, that's exactly what I'm saying -- the first step would be to put that logic in one place (e.g., scheduler/utils.py, like the get_config method we have been thinking to introduce originally), and then we can easily reuse it in all the other filters (regardless of the decision whether to do it within the existing filters, or to add an AggregateXYZ filter for each existing filter XYZ. Same potentially for weight functions, etc). Alex ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] 答复: Proposal for approving Auto HA development blueprint.
Agree. Some enhancements to Nova might be still required (e.g., to handle resource reservations, so that there is enough capacity), but the end-to-end framework probably should be outside of existing services, probably talking to Nova, Ceilometer and potentially other components (maybe Cinder, Neutron, Ironic), and 'orchestrating' failure detection, fencing and recovery. Probably worth a discussion at the upcoming summit. Regards, Alex From: Konglingxian konglingx...@huawei.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 13/08/2013 07:07 AM Subject:[openstack-dev] 答复: Proposal for approving Auto HA development blueprint. Hi yongiman: Your idea is good, but I think the auto HA operation is not OpenStack’s business. IMO, Ceilometer offers ‘monitoring’, Nova offers ‘evacuation’, and you can combine them to realize HA operation. So, I’m afraid I can’t understand the specific implementation details very well. Any different opinions? 发件人: yongi...@gmail.com [mailto:yongi...@gmail.com] 发送时间: 2013年8月12日 20:52 收件人: openstack-dev@lists.openstack.org 主题: Re: [openstack-dev] Proposal for approving Auto HA development blueprint. Hi, Now, I am developing auto ha operation for vm high availability. This function is all progress automatically. It needs other service like ceilometer. ceilometer monitors compute nodes. When ceilometer detects broken compute node, it send a api call to Nova, nova exposes for auto ha API. When received auto ha call, nova progress auto ha operation. All auto ha enabled VM where are running on broken host are all migrated to auto ha Host which is extra compute node for using only Auto-HA function. Below is my blueprint and wiki page. Wiki page is not yet completed. Now I am adding lots of information for this function. Thanks https://blueprints.launchpad.net/nova/+spec/vm-auto-ha-when-host-broken https://wiki.openstack.org/wiki/Autoha ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Can we use two nova schedulers at the same time?
There are roughly three cases. 1. Multiple identical instances of the scheduler service. This is typically done to increase scalability, and is already supported (although sometimes may result in provisioning failures due to race conditions between scheduler instances). There is a single queue of provisioning requests, all the scheduler instances are subscribed, and each request will be processed by one of the instances (randomly, more or less). I think this is not the option that you referred to, though. 2. Multiple cells, each having its own scheduler. This is also supported, but is applicable only if you decide to use cells (e.g., in large-scale geo-distributed deployments). 3. Multiple scheduler configurations within a single (potentially heterogeneous) Nova deployment, with dynamic selection of configuration/policy at run time (for simplicity let's assume just one scheduler service/runtime). This capability is under development ( https://review.openstack.org/#/c/37407/) , targeting Havana. The current design is that the admin will be able to override scheduler properties (such as driver, filters, etc) using flavor extra specs. In some cases you would want to combine this capability with a mechanism that would ensure disjoint partitioning of the managed compute nodes between the drivers. This can be currently achieved by using host aggregates and AggregateInstanceExtraSpec filter of FilterScheduler. For example, if you want to apply driver_A on hosts in aggregate_X, and dirver_B on hosts in aggregate_Y, you would have flavor AX specifying driver_A and properties that would map to aggregate_X, and similarly for BY. Hope this helps. Regards, Alex From: sudheesh sk sud...@yahoo.com To: openstack-dev@lists.openstack.org openstack-dev@lists.openstack.org, Date: 13/08/2013 10:30 AM Subject:[openstack-dev] Can we use two nova schedulers at the same time? Hi, 1) Can nova have more than one scheduler at a time? Standard Scheduler + one custom scheduler? 2) If its possible to add multiple schedulers - how we should configure it. lets say I have a scheduler called 'Scheduler' . So nova conf may look like below scheduler_manager = nova.scheduler.filters.SchedulerManager scheduler_driver = nova.scheduler.filter.Scheduler Then how can I add a second scheduler 3) If there are 2 schedulers - will both of these called when creating a VM? I am asking these questions based on a response I got from ask openstack forum Thanks, Sudheesh___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] support for multiple active scheduler policies/drivers
It is certainly an interesting idea to have a policy service managed via APIs, and to have scheduler as a potential consumer of such as service. However, I suspect that this requires more discussion, and certainly can't be added for Havana (you can count on me to suggest it as a topic for the upcoming design summit). Moreover, I think the currently proposed implementation (incorporating some of the initial feedback provided in this thread) introduces 80% of the value, with 20% of the effort and complexity. If anyone has specific suggestions on how to make it better without adding another 1000 lines of code -- I would be more than glad to adjust. IMO, it is better to start simple in Havana, start getting feedback from the field regarding specific usability/feature requirements earlier rather than later, and incrementally improve going forward. The current design provides clear added value, while not introducing anything that would be conceptually difficult to change in the future (e.g., no new APIs, no schema changes, fully backwards compatible). By the way, the inspiration for the current design was the multi-backend support in Cinder, where a similar approach is used to define multiple Cinder backends in cinder.conf, and to use a simple logic to select the appropriate one at runtime base on the name of the corresponding section. Regards, Alex P.S. the code is ready for review.. Jenkins is still failing, but this seems to be due to a bug which has been reported, fixed and will be merged soon. Day, Phil philip@hp.com wrote on 28/07/2013 01:29:22 PM: From: Day, Phil philip@hp.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 28/07/2013 01:36 PM Subject: Re: [openstack-dev] [Nova] support for multiple active scheduler policies/drivers From: Joe Gordon [mailto:joe.gord...@gmail.com] Sent: 26 July 2013 23:16 To: OpenStack Development Mailing List Subject: Re: [openstack-dev] [Nova] support for multiple active scheduler policies/drivers On Wed, Jul 24, 2013 at 6:18 PM, Alex Glikson glik...@il.ibm.com wrote: Russell Bryant rbry...@redhat.com wrote on 24/07/2013 07:14:27 PM: I really like your point about not needing to set things up via a config file. That's fairly limiting since you can't change it on the fly via the API. True. As I pointed out in another response, the ultimate goal would be to have policies as 'first class citizens' in Nova, including a DB table, API, etc. Maybe even a separate policy service? But in the meantime, it seems that the approach with config file is a reasonable compromise in terms of usability, consistency and simplicity. I think we need to be looking in the future to being able to delegate large parts of the functionality that is currently admin only in Nova, and a large part of that is moving things like this from the config file into APIs. Once we have the Domain capability in ketystone fully available to services like Nova we need to think more about ownership of resources like hosts, and being able to delegate this kind of capability. I do like your idea of making policies first class citizens in Nova, but I am not sure doing this in nova is enough. Wouldn't we need similar things in Cinder and Neutron?Unfortunately this does tie into how to do good scheduling across multiple services, which is another rabbit hole all together. I don't like the idea of putting more logic in the config file, as it is the config files are already too complex, making running any OpenStack deployment require some config file templating and some metadata magic (like heat). I would prefer to keep things like this in aggregates, or something else with a REST API. So why not build a tool on top of aggregates to push the appropriate metadata into the aggregates. This will give you a central point to manage policies, that can easily be updated on the fly (unlike config files). I agree with Jo on this point, and his is the approach we're taking with the Pcloud / whole-host-allocation blueprint: https://review.openstack.org/#/c/38156/ https://wiki.openstack.org/wiki/WholeHostAllocation I don't think realistically we'll be able to land this in Havana now (as much as anything I don't think it had enough air time yet to be sure we have a consensus on all of the details) but Rackspace are now helping with part of this and we do expect to have something in a PoC / Demonstratable state for the Design Summit to provide a more focused discussion. Because the code is layered on top of existing aggregate and scheduler features its pretty easy to keep it as something we can just keep rebasing. Regards, Phil ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] support for multiple active scheduler policies/drivers
Day, Phil philip@hp.com wrote on 24/07/2013 12:39:16 PM: If you want to provide a user with a choice about how much overcommit they will be exposed to then doing that in flavours and the aggregate_instance_extra_spec filter seems the more natural way to do this, since presumably you'd want to charge differently for those and the flavour list is normally what is linked to the pricing model. So, there are 2 aspects here. First, whether policy should be part of flavor definition or separate. I claim that in some cases it would make sense to specify it separately. For example, if we want to support multiple policies for the same virtual hardware configuration, making policy to be part of the flavor extra spec would potentially multiply the number of virtual hardware configurations, which is what flavors essentially are, by the number of policies -- contributing to explosion in the number of flavors in the system. Moreover, although in some cases you would want the user to be aware and distinguish between policies, this is not always the case. For example, the admin may want to apply consolidation/packing policy in one aggregate, and spreading in another. Showing two different flavors does seem reasonable in such cases. Secondly, even if the policy *is* defined in flavor extra spec, I can see value in having a separate filter to handle it. I personally see the main use-case for the extra spec filter in supporting matching of capabilities. Resource management policy is something which should be hidden, or at least abstracted, from the user. And enforcing it with a separate filter could be a 'cleaner' design, and also more convenient -- both from developer perspective and admin perspective. I also like the approach taken by the recent changes to the ram filter where the scheduling characteristics are defined as properties of the aggregate rather than separate stanzas in the configuration file. Indeed, subset of the scenarios we had in mind can be implemented by making each property of each filter/weight an explicit key-value of the aggregate, and making each of the filters/weights aware of those aggregate properties. However, our design have several potential advantages, such as: 1) different policies can have different sets of filters/weights 2) different policies can be even enforced by different drivers 3) the configuration is more maintainable -- the admin defines policies in one place, and not in 10 places (if you have large environment with 10 aggregates). One of the side-effects is improved consistency -- if the admin needs to change a policy, he needs to do it in one place, and he can be sure that all the aggregates comply to one of the valid policies. 4) the developer of filters/weights does need to care whether the parameters are persisted -- nova.conf or aggregate properties An alternative, and the use case I'm most interested in at the moment, is where we want the user to be able to define the scheduling policies on a specific set of hosts allocated to them (in this case they pay for the host, so if they want to oversubscribe on memory/cpu/disk then they should be able to). [...] Its not clear to me if what your proposing addresses an additional gap between this and the combination of the aggregate_extra_spec filter + revised filters to get their configurations from aggregates) ? IMO, this can be done with our proposed implementation. Going forward, I think that policies should be first-class citizens (rather than static sections in nova.conf, or just sets of key-value pairs associated with aggregates). Then we can provide APIs to manage them in a more flexible manner. Regards, Alex Cheers, Phil -Original Message- From: Russell Bryant [mailto:rbry...@redhat.com] Sent: 23 July 2013 22:32 To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Nova] support for multiple active scheduler policies/drivers On 07/23/2013 04:24 PM, Alex Glikson wrote: Russell Bryant rbry...@redhat.com wrote on 23/07/2013 07:19:48 PM: I understand the use case, but can't it just be achieved with 2 flavors and without this new aggreagte-policy mapping? flavor 1 with extra specs to say aggregate A and policy Y flavor 2 with extra specs to say aggregate B and policy Z I agree that this approach is simpler to implement. One of the differences is the level of enforcement that instances within an aggregate are managed under the same policy. For example, nothing would prevent the admin to define 2 flavors with conflicting policies that can be applied to the same aggregate. Another aspect of the same problem is the case when admin wants to apply 2 different policies in 2 aggregates with same capabilities/properties. A natural way to distinguish between the two would be to add an artificial property that would be different between the two -- but then just specifying
Re: [openstack-dev] [Nova] support for multiple active scheduler policies/drivers
Russell Bryant rbry...@redhat.com wrote on 24/07/2013 07:14:27 PM: I really like your point about not needing to set things up via a config file. That's fairly limiting since you can't change it on the fly via the API. True. As I pointed out in another response, the ultimate goal would be to have policies as 'first class citizens' in Nova, including a DB table, API, etc. Maybe even a separate policy service? But in the meantime, it seems that the approach with config file is a reasonable compromise in terms of usability, consistency and simplicity. Regards, Alex -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] support for multiple active scheduler policies/drivers
Russell Bryant rbry...@redhat.com wrote on 23/07/2013 05:35:18 PM: #1 - policy associated with a host aggregate This seems very odd to me. Scheduling policy is what chooses hosts, so having a subset of hosts specify which policy to use seems backwards. This is not what we had in mind. Host aggregate is selected based on policy passed in the request (hint, extra spec, or whatever -- see below) and 'policy' attribute of the aggregate -- possibly in conjunction with 'regular' aggregate filtering. And not the other way around. Maybe the design document is not clear enough about this point. Then I don't understand what this adds over the existing ability to specify an aggregate using extra_specs. The added value is in the ability to configure the scheduler accordingly -- potentially differently for different aggregates -- in addition to just restricting the target host to those belonging to an aggregate with certain properties. For example, let's say we want to support two classes of workloads - CPU-intensive, and memory-intensive. The administrator may decide to use 2 different hardware models, and configure one aggregate with lots of CPU, and another aggregate with lots of memory. In addition to just routing an incoming provisioning request to the correct aggregate (which can be done already), we may want different cpu_allocation_ratio and memory_allocation_ratio when managing resources in each of the aggregates. In order to support this, we would define 2 policies (with corresponding configuration of filters), and attach each one to the corresponding aggregate. #2 - via a scheduler hint How about just making the scheduling policy choice as simple as an item in the flavor extra specs? This is certainly an option. It would be just another implementation of the policy selection interface (implemented using filters). In fact, we already have it implemented -- just thought that explicit hint could be more straightforward to start with. Will include the implementation based on flavor extra spec in the next commit. Ok. I'd actually prefer to remove the scheduler hint support completely. OK, removing the support for doing it via hint is easy :-) I'm not even sure it makes sense to make this pluggable. I can't think of why something other than flavor extra specs is necessary and justifies the additional complexity. Well, I can think of few use-cases when the selection approach might be different. For example, it could be based on tenant properties (derived from some kind of SLA associated with the tenant, determining the over-commit levels), or image properties (e.g., I want to determine placement of Windows instances taking into account Windows licensing considerations), etc I think some additional examples would help. It's also important to have this laid out for documentation purposes. OK, sure, will add more. Hopefully few examples above are also helpful to clarify the intention/design. Regards, Alex -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] support for multiple active scheduler policies/drivers
Russell Bryant rbry...@redhat.com wrote on 23/07/2013 07:19:48 PM: I understand the use case, but can't it just be achieved with 2 flavors and without this new aggreagte-policy mapping? flavor 1 with extra specs to say aggregate A and policy Y flavor 2 with extra specs to say aggregate B and policy Z I agree that this approach is simpler to implement. One of the differences is the level of enforcement that instances within an aggregate are managed under the same policy. For example, nothing would prevent the admin to define 2 flavors with conflicting policies that can be applied to the same aggregate. Another aspect of the same problem is the case when admin wants to apply 2 different policies in 2 aggregates with same capabilities/properties. A natural way to distinguish between the two would be to add an artificial property that would be different between the two -- but then just specifying the policy would make most sense. Well, I can think of few use-cases when the selection approach might be different. For example, it could be based on tenant properties (derived from some kind of SLA associated with the tenant, determining the over-commit levels), or image properties (e.g., I want to determine placement of Windows instances taking into account Windows licensing considerations), etc Well, you can define tenant specific flavors that could have different policy configurations. Would it possible to express something like 'I want CPU over-commit of 2.0 for tenants with SLA=GOLD, and 4.0 for tenants with SLA=SILVER'? I think I'd rather hold off on the extra complexity until there is a concrete implementation of something that requires and justifies it. The extra complexity is actually not that huge.. we reuse the existing mechanism of generic filters. Regarding both suggestions -- I think the value of this blueprint will be somewhat limited if we keep just the simplest version. But if people think that it makes a lot of sense to do it in small increments -- we can probably split the patch into smaller pieces. Regards, Alex -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Nova] support for multiple active scheduler policies/drivers
Dear all, Following the initial discussions at the last design summit, we have published the design [2] and the first take on the implementation [3] of the blueprint adding support for multiple active scheduler policies/drivers [1]. In a nutshell, the idea is to allow overriding the 'default' scheduler configuration parameters (driver, filters, their configuration parameters, etc) for particular host aggregates. The 'policies' are introduced as sections in nova.conf, and each host aggregate can have a key-value specifying the policy (by name). Comments on design or implementation are welcome! Thanks, Alex [1] https://blueprints.launchpad.net/nova/+spec/multiple-scheduler-drivers [2] https://wiki.openstack.org/wiki/Nova/MultipleSchedulerPolicies [3] https://review.openstack.org/#/c/37407/ ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev