Re: [openstack-dev] Change in openstack/heat[master]: Implement a Heat-native resource group
What is the rationale for this new feature? Since there is already an autoscaling group implemented by Heat, what is the added benefit here? And why is it being done as another heat-native thing rather than as an independent service (e.g., as outlined in https://wiki.openstack.org/wiki/Heat/AutoScaling for an autoscaling group service)? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Change in openstack/heat[master]: Implement a Heat-native resource group
Clint Byrum cl...@fewbar.com wrote on 10/17/2013 09:16:12 PM: Excerpts from Mike Spreitzer's message of 2013-10-17 17:19:58 -0700: What is the rationale for this new feature? Since there is already an autoscaling group implemented by Heat, what is the added benefit here? And why is it being done as another heat-native thing rather than as an independent service (e.g., as outlined in https://wiki.openstack.org/wiki/Heat/AutoScaling for an autoscaling group service)? This supports that design quite well. The point is to be able to group and clone any resource, not just server/instance. So autoscaling might be configured to manage a group of Trove database instances which are then fed as a list to a group of separately autoscaled webservers. Thanks for the answer. I'm just a newbie here, trying to understand what's going on. I still don't quite follow. https://wiki.openstack.org/wiki/Heat/AutoScaling says that what's autoscaled is a set of resources, not just one. Can there be dependencies among the resources in that set? For example, is the intent that I could autoscale a pair of (DB server, web server) where the web server's properties depend on the DB server's attributes? If so, would it be problematic to implement that in terms of a pair of Heat-native resource groups? BTW, is there some place I could have read the answers to my questions about the design thinking here? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] HOT Software configuration proposal
Steven Hardy sha...@redhat.com wrote on 10/16/2013 04:11:40 AM: ... IMO we should be abstracting the software configuration complexity behind a Heat resource interface, not pushing it up to a pre-processor (which implies some horribly complex interfaces at the heat template level) I am not sure I follow. Can you please elaborate on the horrible implication? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] HOT Software configuration proposal
Zane Bitter zbit...@redhat.com wrote on 10/16/2013 10:30:44 AM: On 16/10/13 15:58, Mike Spreitzer wrote: ... Thanks for a great short sharp answer. In that light, I see a concern. Once a workflow has been generated, the system has lost the ability to adapt to changes in either model. In a highly concurrent and dynamic environment, that could be problematic. I think you're referring to the fact if reality diverges from the model we have no way to bring it back in line (and even when doing an update, things can and usually will go wrong if Heat's idea of the existing template does not reflect reality any more). If so, then I agree that we are weak in this area. You're obviously aware of http://summit.openstack.org/cfp/details/95 so it is definitely on the radar. Actually, I am thinking of both of the two models you mentioned. We are only in the midst of implementing an even newer design (heat based), but for my group's old code we have a revised design in which the infrastructure orchestrator can react to being overtaken by later updates to the model we call target state (origin source is client) as well as concurrent updates to the model we call observed state (origin source is hardware/hypervisor). I haven't yet decided what to recommend to the heat community, so I'm just mentioning the issue as a possible concern. Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Scheduler meeting and Icehouse Summit
Mike Wilson geekinu...@gmail.com wrote on 10/16/2013 07:13:17 PM: I need to understand better what holistic scheduling means, ... By holistic I simply mean making a joint decision all at once about a bunch of related resources of a variety of types. For example, making a joint decision about where to place a set of VMs and the Cinder volumes that will be attached to the VMs. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] HOT Software configuration proposal
Steve Baker sba...@redhat.com wrote on 10/15/2013 06:48:53 PM: From: Steve Baker sba...@redhat.com To: openstack-dev@lists.openstack.org, Date: 10/15/2013 06:51 PM Subject: [openstack-dev] [Heat] HOT Software configuration proposal I've just written some proposals to address Heat's HOT software configuration needs, and I'd like to use this thread to get some feedback: https://wiki.openstack.org/wiki/Heat/Blueprints/hot-software-config In that proposal, each component can use a different configuration management tool. https://wiki.openstack.org/wiki/Heat/Blueprints/native-tools-bootstrap-config In this proposal, I get the idea that it is intended that each Compute instance run only one configuration management tool. At least, most of the text discusses the support (e.g., the idea that each CM tool supplies userdata to bootstrap itself) in terms appropriate for a single CM tool per instance; also, there is no discussion of combining userdata from several CM tools. I agree with the separation of concerns issues that have been raised. I think all this software config stuff can be handled by a pre-processor that takes an extended template in and outputs a plain template that can be consumed by today's heat engine (no extension to the heat engine necessary). Regards, Mike ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] HOT Software configuration proposal
The threading in the archive includes this discussion under the HOT Software orchestration proposal for workflows heading, and the overall ordering in the archive looks very mixed up to me. I am going to reply here, hoping that the new subject line will be subject to less strange ordering in the archive; this is really a continuation of the overall discussion, not just Steve Baker's proposal. What is the difference between what today's heat engine does and a workflow? I am interested to hear what you experts think, I hope it will be clarifying. I presume the answers will touch on things like error handling, state tracking, and updates. I see the essence of Steve Baker's proposal to be that of doing the minimal mods necessary to enable the heat engine to orchestrate software components. The observation is that not much has to change, since the heat engine is already in the business of calling out to things and passing values around. I see a little bit of a difference, maybe because I am too new to already know why it is not an issue. In today's heat engine, the calls are made to fixed services to do CRUD operations on virtual resources in the cloud, using credentials managed implicitly; the services have fixed endpoints, even as the virtual resources come and go. Software components have no fixed service endpoints; the service endpoints come and go as the host Compute instances come and go; I did not notice a story about authorization for the software component calls. Interestingly, Steve Baker's proposal reminds me a lot of Chef. If you just rename Steve's component to recipe, the alignment gets real obvious; I am sure it is no accident. I am not saying it is isomorphic --- clearly Steve Baker's proposal has more going on, with its cross-VM data dependencies and synchronization. But let me emphasize that we can start to see a different way of thinking here. Rather than focusing on a centrally-run workflow, think of each VM as independently running its own series of recipes --- with the recipes invocations now able to communicate and synchronize between VMs as well as within VMs. Steve Baker's proposal uses two forms of communication and synchronization between VMs: (1) get_attr and (2) wait conditions and handles (sugar coated or not). The implementation of (1) is part of the way the heat engine invokes components, the implementation of (2) is independent of the heat engine. Using the heat engine for orchestration is limited to the kinds of logic that the heat engine can run. This may be one reason people are suggesting using a general workflow engine. However, the recipes (components) running in the VMs can do general computation; if we allow general cross-VM communication and synchronization as part of those general computations, we clearly have a more expressive system than the heat engine. Of course, a general distributed computation can get itself into trouble (e.g., deadlock, livelock). If we structure that computation as a set of components (recipe invocations) with a DAG of dependencies then we avoid those troubles. And the kind of orchestration that the heat engine does is sufficient to invoke such components. Structuring software orchestration as a DAG of components also gives us a leg up on UPDATE. Rather than asking the user to write a workflow for each different update, or a general meta-workflow that does introspection to decide what work needs to be done, we ask the thing that invokes the components to run through the components in the way that today's heat engine runs through resources for an UPDATE. Lakshmi has been working on a software orchestration technique that is also centered on the idea of a DAG of components. It was created before we got real interested in Heat. It is implemented as a pre-processor that runs upstream of where today's heat engine goes, emitting fairly minimal userdata needed for bootstrapping. The dependencies between recipe invocations are handled very smoothly in the recipes, which are written in Chef. No hackery is needed in the recipe text at all (thanks to Ruby metaprogramming); what is needed is only an additional declaration of what are the cross-VM inputs and outputs of each recipe. The propagation of data and synchronization between VMs is handled, under the covers, via simple usage of ZooKeeper (other implementations are reasonable too). But the idea of heat-independent propagation of data and synchronization among a DAG of components is not limited to chef-based components, and can appear fairly smooth in any recipe language. A value of making software orchestration independent of today's heat engine is that it enables the four-stage pipeline that I have sketched at https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U and whose ordering of functionality has been experimentally vetted with some non-trivial examples. The first big one
Re: [openstack-dev] [scheduler] Policy Model
Consider the example at https://docs.google.com/drawings/d/1nridrUUwNaDrHQoGwSJ_KXYC7ik09wUuV3vXw1MyvlY We could indeed have distinct policy objects. But I think they are policy *uses*, not policy *definitions* --- which is why is prefer to give them less prominent lifecycles. In the example cited above, one policy use object might be: {id: some int, type: anti_collocation, properties: {level: rack}}, and there are four references to it; another policy use object might be {id: some int, type: network_reachability}, and there are three references to it. What object should own the policy use objects? You might answer that policy uses are owned by groups. I do not think it makes sense to give them a more prominent lifecycle. As I said, my preference would be to give them a less prominent lifecycle. I would be happy to see each policy use owned by an InstanceGroupPolicy[Use] that references it and allow only one reference per policy use --- in other words, make the InstanceGroupPolicy[Use] class inherit from the Policy Use class. And since I am not proposing that anything else inherit from the Policy Use class, I would even more prefer to see its contents simply merged inline into the InstanceGroupPolicy[Use] class. Regards, Mike From: Yathiraj Udupi (yudupi) yud...@cisco.com To: Mike Spreitzer/Watson/IBM@IBMUS, Cc: OpenStack Development Mailing List openstack-dev@lists.openstack.org Date: 10/14/2013 01:38 PM Subject:Re: [scheduler] Policy Model Mike, Like I proposed in my previous email about the model and the APIs, About the InstanceGroupPolicy, why not leave it as is, and introduce a new abstract model class called Policy. The InstanceGroupPolicy will be a reference to a Policy object saved separately. and the policy field will point to the saved Policy object's unique name or id. The new class Policy – can have the usual fields – id, name, uuid, and a dictionary of key-value pairs for any additional arguments about the policy. This is in alignment with the model for InstanceGroupMember, which is a reference to an actual Instance Object saved in the DB. I will color all the diamonds black to make it a composition I the UML diagram. Thanks, Yathi. From: Mike Spreitzer mspre...@us.ibm.com Date: Monday, October 14, 2013 7:14 AM To: Yathiraj Udupi yud...@cisco.com Cc: OpenStack Development Mailing List openstack-dev@lists.openstack.org Subject: [scheduler] Policy Model Could we agree on the following small changes to the model you posted last week? 1. Rename InstanceGroupPolicy to InstanceGroupPolicyUse 2. In InstanceGroupPolicy[Use], rename the policy field to policy_type 3. Add an InstanceGroupPolicyUseProperty table, holding key/value pairs (two strings) giving the properties of the policy uses 4. Color all the diamonds black Thanks, Mike ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft
That came through beautifully formatted to me, but it looks much worse in the archive. I'm going to use crude email tech here, so that I know it won't lose anything in handling. Yathiraj Udupi (yudupi) yud...@cisco.com wrote on 10/14/2013 01:17:47 PM: I read your email where you expressed concerns regarding create-time dependencies, and I agree they are valid concerns to be addressed. But like we all agree, as a starting point, we are just focusing on the APIs for now, and will leave that aside as implementation details to be addressed later. I am not sure I understand your language here. To me, design decisions that affect what calls the clients make are not implementation details, they are part of the API design. Thanks for sharing your suggestions on how we can simplify the APIs. I think we are getting closer to finalizing this one. Let us start at the model proposed here - [1] https://docs.google.com/document/d/ 17OIiBoIavih-1y4zzK0oXyI66529f-7JTCVj-BcXURA/edit?usp=sharing (Ignore the white diamonds - they will be black, when I edit the doc) The InstanceGroup represents all the information necessary to capture the group - nodes, edges, policies, and metadata InstanceGroupMember - is a reference to an Instance, which is saved separately, using the existing Instance Model in Nova. I think you mean this is a reference to either a group or an individual Compute instance. InstanceGroupMemberConnection - represents the edge InstanceGroupPolicy is a reference to a Policy, which will also be saved separately, (currently not existing in the model, but has to be created). Here in the Policy model, I don't mind adding any number of additional fields, and key-value pairs to be able to fully define a policy. I guess a Policy-metadata dictionary is sufficient to capture all the required arguments. The InstanceGroupPolicy will be associated to a group as a whole or an edge. Like I said under separate cover, I think one of these is a policy *use* rather than a policy *definition*. I go further and emphasize that the interesting out-of-scope definitions are of policy *types*. A policy type takes parameters. For example, policies of the anti-collocation (AKA anti-affinity) type have a parameter that specifies the level in the physical hierarchy where the location must differ (rack, host, ...). Each policy type specifies a set of parameters, just like a procedure specifies parameters; each use of a policy type supplies values for the parameters, just like a procedure invocation supplies values for the procedure's parameters. I suggest separating parameter values from metadata; the former are described by the policy type, while the latter are unknown to the policy type and are there for other needs of the client. Yes, a use of a policy type is associated with a group or an edge. In my own writing I have suggested a third possibility: that a policy use can be directly associated with an individual resource. It just so happens that the code my group already has been running also has your restriction: it supports only policies associated with groups and relationships. But I suggested allowing direct attachment to resources (as well as relationships also being able to directly reference resources instead of groups) because I think this restriction --- while it simplifies implementation --- makes templates more verbose; I felt the latter was a more important consideration than the former. If you want to roadmap this --- restricted first, liberal later --- that's fine with me. InstanceGroupMetadata - represents key-value dictionary for any additional metadata for the instance group. I think this should fully support what we care about - nodes, edges, policies and metadata. Do we all agree ? Yes, with exceptions noted above. Now going to the APIs, Register GROUP API (from my doc [1]): POST /v3.0/{tenant_id}/groups --- Register a group In such specs it would be good to be explicit about the request parameters and body. If I follow correctly, https://review.openstack.org/#/c/30028/25/doc/api_samples/os-instance-groups/instance-groups-post-req.json shows us that you intended (as of that patch) the body to carry a group definition. I think the confusion is only about when the member (all nested members) and policy about when they are saved in the DB (registered, but not CREATED actually), such that we can associate a UUID. This led to my original thinking that it is a 3-phase operation where we have to register (save in DB) the nested members first, then register the group as a whole. But this is not client friendly. Like I had suggested earlier, as an implementation detail of the Group registration API (CREATE part 1 in your terminology), we can support this: as part of the group registration transaction, complete the registration of the nested members, get their UUIDs, create
Re: [openstack-dev] Scheduler meeting and Icehouse Summit
Yes, Rethinking Scheduler Design http://summit.openstack.org/cfp/details/34 is not the same as the performance issue that Boris raised. I think the former would be a natural consequence of moving to an optimization-based joint decision-making framework, because such a thing necessarily takes a good enough attitude. The issue Boris raised is more efficient tracking of the true state of resources, and I am interested in that issue too. A holistic scheduler needs such tracking, in addition to the needs of the individual services. Having multiple consumers makes the issue more interesting :-) Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] Policy Model
Yathiraj Udupi (yudupi) yud...@cisco.com wrote on 10/14/2013 11:43:34 PM: ... For the policy model, you can expect rows in the DB each representing different policy instances something like- {id: , uuid: SOME-UUID-1, name: anti-colocation-1, type: anti-colocation, properties: {level: rack}} {id: , uuid: SOME-UUID-2, name: anti-colocation-2, type: anti-colocation, properties: {level: PM}} {id: , uuid: SOME-UUID-3, name: network-reachabilty-1, type: network-reachability properties: {}} And for the InstanceGroupPolicy model, you can expect rows such as {id: 5, policy: SOME-UUID-1, type: group, edge_id: , group_id: 12345} {id: 6, policy: SOME-UUID-1, type: group, edge_id: , group_id: 22334} Do you imagine just one policy object of a given contents, or many? Put another way, would every InstanceGroupPolicy object that wants to apply a rack-level anti-collocation policy use SOME-UUID-1? Who or what created the record with id ? Who or what decides to delete it, and when and why? What about dangling references? It seems to me that needing to answer these questions simply imposes unnecessary burdens. If the type and properties fields of record id were merged inline (replacing the policy:SOME-UUID-1 field) into records id , , and the other uses, then there are no hard questions to answer; the group author knows what policies he wants to apply and where, and he simply writes them there. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft
Regarding Alex's question of which component does holistic infrastructure scheduling, I hesitate to simply answer heat. Heat is about orchestration, and infrastructure scheduling is another matter. I have attempted to draw pictures to sort this out, see https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U and https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g . In those you will see that I identify holistic infrastructure scheduling as separate functionality from infrastructure orchestration (the main job of today's heat engine) and also separate from software orchestration concerns. However, I also see a close relationship between holistic infrastructure scheduling and heat, as should be evident in those pictures too. Alex made a remark about the needed inputs, and I agree but would like to expand a little on the topic. One thing any scheduler needs is knowledge of the amount, structure, and capacity of the hosting thingies (I wish I could say resources, but that would be confusing) onto which the workload is to be scheduled. Scheduling decisions are made against available capacity. I think the most practical way to determine available capacity is to separately track raw capacity and current (plus already planned!) allocations from that capacity, finally subtracting the latter from the former. In Nova, for example, sensing raw capacity is handled by the various nova-compute agents reporting that information. I think a holistic infrastructure scheduler should get that information from the various individual services (Nova, Cinder, etc) that it is concerned with (presumably they have it anyway). A holistic infrastructure scheduler can keep track of the allocations it has planned (regardless of whether they have been executed yet). However, there may also be allocations that did not originate in the holistic infrastructure scheduler. The individual underlying services should be able to report (to the holistic infrastructure scheduler, even if lowly users are not so authorized) all the allocations currently in effect. An accurate union of the current and planned allocations is what we want to subtract from raw capacity to get available capacity. If there is a long delay between planning and executing an allocation, there can be nasty surprises from competitors --- if there are any competitors. Actually, there can be nasty surprises anyway. Any scheduler should be prepared for nasty surprises, and react by some sensible retrying. If nasty surprises are rare, we are pretty much done. If nasty surprises due to the presence of competing managers are common, we may be able to combat the problem by changing the long delay to a short one --- by moving the allocation execution earlier into a stage that is only about locking in allocations, leaving all the other work involved in creating virtual resources to later (perhaps Climate will be good for this). If the delay between planning and executing an allocation is short and there are many nasty surprises due to competing managers, then you have too much competition between managers --- don't do that. Debo wants a simpler nova-centric story. OK, how about the following. This is for the first step in the roadmap, where scheduling decisions are still made independently for each VM instance. For the client/service interface, I think we can do this with a simple clean two-phase interface when traditional software orchestration is in play, a one-phase interface when slick new software orchestration is used. Let me outline the two-phase flow. We extend the Nova API with CRUD operations on VRTs (top-level groups). For example, the CREATE operation takes a definition of a top-level group and all its nested groups, definitions (excepting stuff like userdata) of all the resources (only VM instances, for now) contained in those groups, all the relationships among those groups/resources, and all the applications of policy to those groups, resources, and relationships. This is a rest-style interface; the CREATE operation takes a definition of the thing (a top-level group and all that it contains) being created; the UPDATE operation takes a revised definition of the whole thing. Nova records the presented information; the familiar stuff is stored essentially as it is today (but marked as being in some new sort of tentative state), and the grouping, relationship, and policy stuff is stored according to a model like the one DeboYathi wrote. The CREATE operation returns a UUID for the newly created top-level group. The invocation of the top-level group CRUD is a single operation and it is the first of the two phases. In the second phase of a CREATE flow, the client creates individual resources with the same calls as are used today, except that each VM instance create call is augmented with a pointer into the policy information. That
Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft
I'll be at the summit too. Available Nov 4 if we want to do some prep then. It will be my first summit, I am not sure how overbooked my summit time will be. Regards, Mike From: Sylvain Bauza sylvain.ba...@bull.net To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Cc: Mike Spreitzer/Watson/IBM@IBMUS Date: 10/11/2013 08:19 AM Subject:Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft Long-story short, sounds like we do have the same concerns here in Climate. I'll be present at the Summit, any chance to do an unconference meeting in between all parties ? Thanks, -Sylvain Le 11/10/2013 08:25, Mike Spreitzer a écrit : Regarding Alex's question of which component does holistic infrastructure scheduling, I hesitate to simply answer heat. Heat is about orchestration, and infrastructure scheduling is another matter. I have attempted to draw pictures to sort this out, see https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U and https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH _TONw6g . In those you will see that I identify holistic infrastructure scheduling as separate functionality from infrastructure orchestration (the main job of today's heat engine) and also separate from software orchestration concerns. However, I also see a close relationship between holistic infrastructure scheduling and heat, as should be evident in those pictures too. Alex made a remark about the needed inputs, and I agree but would like to expand a little on the topic. One thing any scheduler needs is knowledge of the amount, structure, and capacity of the hosting thingies (I wish I could say resources, but that would be confusing) onto which the workload is to be scheduled. Scheduling decisions are made against available capacity. I think the most practical way to determine available capacity is to separately track raw capacity and current (plus already planned!) allocations from that capacity, finally subtracting the latter from the former. In Nova, for example, sensing raw capacity is handled by the various nova-compute agents reporting that information. I think a holistic infrastructure scheduler should get that information from the various individual services (Nova, Cinder, etc) that it is concerned with (presumably they have it anyway). A holistic infrastructure scheduler can keep track of the allocations it has planned (regardless of whether they have been executed yet). However, there may also be allocations that did not originate in the holistic infrastructure scheduler. The individual underlying services should be able to report (to the holistic infrastructure scheduler, even if lowly users are not so authorized) all the allocations currently in effect. An accurate union of the current and planned allocations is what we want to subtract from raw capacity to get available capacity. If there is a long delay between planning and executing an allocation, there can be nasty surprises from competitors --- if there are any competitors. Actually, there can be nasty surprises anyway. Any scheduler should be prepared for nasty surprises, and react by some sensible retrying. If nasty surprises are rare, we are pretty much done. If nasty surprises due to the presence of competing managers are common, we may be able to combat the problem by changing the long delay to a short one --- by moving the allocation execution earlier into a stage that is only about locking in allocations, leaving all the other work involved in creating virtual resources to later (perhaps Climate will be good for this). If the delay between planning and executing an allocation is short and there are many nasty surprises due to competing managers, then you have too much competition between managers --- don't do that. Debo wants a simpler nova-centric story. OK, how about the following. This is for the first step in the roadmap, where scheduling decisions are still made independently for each VM instance. For the client/service interface, I think we can do this with a simple clean two-phase interface when traditional software orchestration is in play, a one-phase interface when slick new software orchestration is used. Let me outline the two-phase flow. We extend the Nova API with CRUD operations on VRTs (top-level groups). For example, the CREATE operation takes a definition of a top-level group and all its nested groups, definitions (excepting stuff like userdata) of all the resources (only VM instances, for now) contained in those groups, all the relationships among those groups/resources, and all the applications of policy to those groups, resources, and relationships. This is a rest-style interface; the CREATE operation takes a definition of the thing (a top-level group and all
Re: [openstack-dev] [Heat] HOT Software orchestration proposal for workflows
I favor separation of concerns. I think (4), at least, has got nothing to do with infrastructure orchestration, the primary concern of today's heat engine. I advocate (4), but as separate functionality. Regards, Mike Alex Rudenko alexei.rude...@gmail.com wrote on 10/09/2013 12:59:22 PM: From: Alex Rudenko alexei.rude...@gmail.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/09/2013 01:03 PM Subject: Re: [openstack-dev] [Heat] HOT Software orchestration proposal for workflows Hi everyone, I've read this thread and I'd like to share some thoughts. In my opinion, workflows (which run on VMs) can be integrated with heat templates as follows: 1. workflow definitions should be defined separately and processed by stand-alone workflow engines (chef, puppet etc). 2. the HOT resources should reference workflows which they require, specifying a type of workflow and the way to access a workflow definition. The workflow definition might be provided along with HOT. 3. Heat should treat the orchestration templates as transactions (i.e. Heat should be able to rollback in two cases: 1) if something goes wrong during processing of an orchestration workflow 2) when a stand-alone workflow engine reports an error during processing of a workflow associated with a resource) 4. Heat should expose an API which enables basic communication between running workflows. Additionally, Heat should provide an API to workflows that allows workflows to specify whether they completed successfully or not. The reference to these APIs should be passed to the workflow engine that is responsible for executing workflows on VMs. Pros of each point: 1 2 - keeps Heat simple and gives a possibility to choose the best workflows and engines among available ones. 3 - adds some kind of all-or-nothing semantics improving the control and awareness of what's going on inside VMs. 4 - allows workflow synchronization and communication through Heat API. Provides the error reporting mechanism for workflows. If a workflow does not need this functionality, it can ignore it. Cons: - Changes to existing workflows making them aware of Heat existence are required. These thoughts might show some gaps in my understanding of how Heat works, but I would like to share them anyway. Best regards, Oleksii Rudenko ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft
Yes, there is more than the northbound API to discuss. Gary started us there in the Scheduler chat on Oct 1, when he broke the issues down like this: 11:12:22 AM garyk: 1. a user facing API 11:12:41 AM garyk: 2. understanding which resources need to be tracked 11:12:48 AM garyk: 3. backend implementation The full transcript is at http://eavesdrop.openstack.org/meetings/scheduling/2013/scheduling.2013-10-01-15.08.log.html Alex Glikson glik...@il.ibm.com wrote on 10/09/2013 02:14:03 AM: Good summary. I would also add that in A1 the schedulers (e.g., in Nova and Cinder) could talk to each other to coordinate. Besides defining the policy, and the user-facing APIs, I think we should also outline those cross-component APIs (need to think whether they have to be user-visible, or can be admin). Regards, Alex ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft
Debojyoti Dutta ddu...@gmail.com wrote on 10/09/2013 02:48:26 AM: Mike, I agree we could have a cleaner API but I am not sure how cleanly it will integrate with current nova which IMO should be test we should pass (assuming we do cross services later) I think the cleaner APIs integrate with Nova as well as the three phase API you suggested. Am I missing some obvious impediment? ... To me the most frustrating aspect of this challenge is the need for the client to directly mediate the dependencies between resources; this is really what is driving us to do ugly things. As I mentioned before, I am coming from a setting that does not have this problem. So I am thinking about two alternatives: (A1) how clean can we make a system in which the client continues to directly mediate dependencies between resources, and (A2) how easily and cleanly can we make that problem go away. Am a little confused - How is the API dictating either A1 or A2? Isnt that a function of the implementation of the API. For a moment let us assume that the black box implementation will be awesome and address your concerns. I am talking about how the client/service interface, it is not (just) a matter of service implementation. My complaint is that the software orchestration technique commonly used prevents us from having a one-phase API for holistic infrastructure scheduling. The commonly used software orchestration technique requires some serialization of the resource creation calls. For example, if one VM instance runs a database and another VM instance runs a web server that needs to be configured with the private IP address of the database, the common technique is for the client to first create the database VM instance, then take the private IP address from that VM instance and use it to compose the userdata that is passed in the Nova call that creates the web server VM instance. That client can not present all at once a fully concrete and literal specification of both VM instances, because the userdata for one is not knowable until the other has been created. The client has to be able to make create-like calls in some particular order rather than ask for all creation at once. If the client could ask for all creation at once then we could use a one-phase API: it simply takes a specification of the resources along with their policies and relationships. Of course, there is another way out. We do have in OpenStack a technology by which a client can present all at once a specification of many VM instances where the userdata of some depend on the results of creating others. If we were willing to use this technology, we could follow A2. The CREATE flow would go like this: (i) the client presents the specification of resources (including the computations that link some), with grouping, relationships, and policies, to our new API; (ii) our new service registers the new topology and (once we advance this far on the development roadmap) does holistic scheduling; (iii) our new service updates the resource specifications to include pointers into the policy data; (iv) our new service passes the enhanced resource specifications to that other service that can do the creation calls linked by the prescribed computations; (v) that other service does its thing, causing a series (maybe with some allowed parallelism) of creation calls, each augmented by the relevant pointer into the policy information; (vi) the service implementing a creation call gets what it normally does plus the policy pointer, which it follows to get the relevant policy information (at the first step in the development roadmap) or the scheduling decision (in the second step of the development roadmap). But I am getting ahead of myself here and discussing backend implementation; I think we are still working on the user-facing API. The question is this - does the current API help specify what we want assuming we will be able to extend the notion of nodes, edges, policies and metadata? I am not sure I understand that remark. Of course the API you proposed is about enabling the client to express the policy information that we both advocate. I am not sure I understand why you add the qualifier of assuming we will be able to extend the notion of I do not think we (yet) have a policy type catalog set in stone, if that is the concern. I think there is an interesting discussion to have about defining that catalog. BTW, note that the class you called InstanceGroupPolicy is not just a reference to a policy, it also specifies one place where that policy is being applied. That is really the class of policy applications (or uses). I think some types of policies have parameters. A relationship policy about limiting the number of network hops takes a parameter that is the hop count limit. A policy about anti-collocation takes a physical hierarchy level as a parameter, to put a lower
Re: [openstack-dev] [Climate] Questions and comments
Yes, that helps. Please, guys, do not interpret my questions as hostility, I really am just trying to understand. I think there is some overlap between your concerns and mine, and I hope we can work together. Sticking to the physical reservations for the moment, let me ask for a little more explicit details. In your outline below, late in the game you write the actual reservation is performed by the lease manager plugin. Is that the point in time when something (the lease manager plugin, in fact) decides which hosts will be used to satisfy the reservation? Or is that decided up-front when the reservation is made? I do not understand how the lease manager plugin can make this decision on its own, isn't the nova scheduler also deciding how to use hosts? Why isn't there a problem due to two independent allocators making allocations of the same resources (the system's hosts)? Thanks, Mike Patrick Petit patrick.pe...@bull.net wrote on 10/07/2013 07:02:36 AM: Hi Mike, There are actually more facets to this. Sorry if it's a little confusing :-( Climate's original blueprint https:// wiki.openstack.org/wiki/Blueprint-nova-planned-resource-reservation-api was about physical host reservation only. The typical use case being: I want to reserve x number of hosts that match the capabilities expressed in the reservation request. The lease is populated with reservations which at this point are only capacity descriptors. The reservation becomes active only when the lease starts at a specified time and for a specified duration. The lease manager plugin in charge of the physical reservation has a planning of reservations that allows Climate to grant a lease only if the requested capacity is available at that time. Once the lease becomes active, the user can request instances to be created on the reserved hosts using a lease handle as a Nova's scheduler hint. That's basically it. We do not assume or enforce how and by whom (Nova, Heat ,...) a resource instantiation is performed. In other words, a host reservation is like a whole host allocation https:// wiki.openstack.org/wiki/WholeHostAllocation that is reserved ahead of time by a tenant in anticipation of some workloads that is bound to happen in the future. Note that while we are primarily targeting hosts reservations the same service should be offered for storage. Now, Mirantis brought in a slew of new use cases that are targeted toward virtual resource reservation as explained earlier by Dina. While architecturally both reservation schemes (physical vs virtual) leverage common components, it is important to understand that they behave differently. For example, Climate exposes an API for the physical resource reservation that the virtual resource reservation doesn't. That's because virtual resources are supposed to be already reserved (through some yet to be created Nova, Heat, Cinder,... extensions) when the lease is created. Things work differently for the physical resource reservation in that the actual reservation is performed by the lease manager plugin not before the lease is created but when the lease becomes active (or some time before depending on the provisioning lead time) and released when the lease ends. HTH clarifying things. BR, Patrick ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Climate] Questions and comments
Sylvain: please do not interpret my questions as hostility. I am only trying to understand your proposal, but I am still confused. Can you please walk through a scenario involving Climate reservations on virtual resources? I mean from start to finish, outlining which party makes which decision when, based on what. I am trying to understand the relationship between the individual resource schedulers (such as nova, cinder) and climate --- they both seem to be about allocating the same resources. Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft
Thanks for the clue about where the request/response bodies are documented. Is there any convenient way to view built documentation for Havana right now? You speak repeatedly of the desire for clean interfaces, and nobody could disagree with such words. I characterize my desire that way too. It might help me if you elaborate a little on what clean means to you. To me it is about minimizing the number of interactions between different modules/agents and the amount of information in those interactions. In short, it is about making narrow interfaces - a form of simplicity. To me the most frustrating aspect of this challenge is the need for the client to directly mediate the dependencies between resources; this is really what is driving us to do ugly things. As I mentioned before, I am coming from a setting that does not have this problem. So I am thinking about two alternatives: (A1) how clean can we make a system in which the client continues to directly mediate dependencies between resources, and (A2) how easily and cleanly can we make that problem go away. For A1, we need the client to make a distinct activation call for each resource. You have said that we should start the roadmap without joint scheduling; in this case, the scheduling can continue to be done independently for each resource and can be bundled with the activation call. That can be the call we know and love today, the one that creates a resource, except that it needs to be augmented to also carry some pointer that points into the policy data so that the relevant policy data can be taken into account when making the scheduling decision. Ergo, the client needs to know this pointer value for each resource. The simplest approach would be to let that pointer be the combination of (p1) a VRT's UUID and (p2) the local name for the resource within the VRT. Other alternatives are possible, but require more bookkeeping by the client. I think that at the first step of the roadmap for A1, the client/service interaction for CREATE can be in just two phases. In the first phase the client presents a topology (top-level InstanceGroup in your terminology), including resource definitions, to the new API for registration; the response is a UUID for that registered top-level group. In the second phase the client creates the resources as is done today, except that each creation call is augmented to carry the aforementioned pointer into the policy information. Each resource scheduler (just nova, at first) can use that pointer to access the relevant policy information and take it into account when scheduling. The client/service interaction for UPDATE would be in the same two phases: first update the policyresource definitions at the new API, then do the individual resource updates in dependency order. I suppose the second step in the roadmap is to have Nova do joint scheduling. The client/service interaction pattern can stay the same. The only difference is that Nova makes the scheduling decisions in the first phase rather than the second. But that is not a detail exposed to the clients. Maybe the third step is to generalize beyond nova? For A2, the first question is how to remove user-level create-time dependencies between resources. We are only concerned with the user-level create-time dependencies here because it is only they that drive intimate client interactions. There are also create-time dependencies due to the nature of the resource APIs; for example, you can not attach a volume to a VM until after both have been created. But handling those kinds of create-time dependencies does not require intimate interactions with the client. I know of two software orchestration technologies developed in IBM, and both have the property that there are no user-level create-time dependencies between resources; rather, the startup code (userdata) that each VM runs handles dependencies (using a library for cross-VM communication and synchronization). This can even be done in plain CFN, using wait conditions and handles (albeit somewhat clunkily), right? So I think there are ways to get this nice property already. The next question is how best to exploit it to make cleaner APIs. I think we can have a one-step client/service interaction: the client presents a top-level group (including leaf resource definitions) to the new service, which registers it and proceeds to create/schedule/activate the resources. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Climate] Questions and comments
Do not worry about what I want, right now I am just trying to understand the Climate proposal, wrt virtual resources (Patrick helped a lot on the physical side). Can you please walk through a scenario involving Climate reservations on virtual resources? I mean from start to finish, outlining which party makes which decision, based on what. Thanks, Mike From: Sylvain Bauza sylvain.ba...@bull.net To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Cc: Mike Spreitzer/Watson/IBM@IBMUS Date: 10/07/2013 05:07 AM Subject:Re: [openstack-dev] [Climate] Questions and comments Hi Mike, Dina and you outlined some differences in terms of seeing what is dependent on what. As Dina explained, Climate plans to be integrated into Nova and Heat logics, where Heat and Nova would request Climate API by asking for a lease and would tag on their own the resources as 'RESERVED'. On your point, and correct me if I'm wrong, you would rather see Climate on top of Heat and Nova, scheduling resources on its own, and only send creation requests to Heat and Nova. I'm happy to say both of you are right : Climate aims to be both called by Nova and *also* calling Nova. That's just matter of what Climate *is*. And here is the confusion. That's why Climate is not only one API endpoint. It actually have two distinct endpoints : one called the Lease API endpoint, and one called the Resource Reservation API endpoint. As a Climate developer working on physical hosts reservations (and not Heat stacks), my concern is to be able to guarantee to a REST client (either a user or another service) that if this user wants to provision X hosts on a specific timeframe in the future (immediate or in 10 years), Climate will be able to provision them. By meaning being able and guarantee, I do use strong words for stating that we engage ourselves to be able to plan what will be resources capacity state in the future. This decision-making process (ie. this Climate scheduler) will be implemented as RPC Service for the Reservation API, and thus will needs to keep its own persistence layer in Climate. Of course, it will request the Lease API for really creating the lease and managing lease start/end hooks, that's the Lease API job. Provided you would want to use the Reservation API for reserving Heat stacks, you would have to implement it tho. Thanks, -Sylvain Le 06/10/2013 20:41, Mike Spreitzer a écrit : Thanks, Dina. Yes, we do not understand each other; can I ask some more questions? You outlined a two-step reservation process (We assume the following reservation process for the OpenStack services...), and right after that talked about changing your mind to use Heat instead of individual services. So I am confused, I am not sure which of your remarks reflect your current thinking and which reflect old thinking. Can you just state your current thinking? On what basis would Climate decide to start or stop a lease? What sort of event notifications would Climate be sending, and when and why, and what would subscribers do upon receipt of such notifications? If the individual resource services continue to make independent scheduling decisions as they do today, what value does Climate add? Maybe a little more detailed outline of what happens in your current thinking, in support of an explicitly stated use case that shows the value, would help here. Thanks, Mike _ __ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft
Thanks. I have a few questions. First, I am a bit stymied by the style of API documentation used in that document and many others: it shows the first line of an HTTP request but says nothing about all the other details. I am sure some of those requests must have interesting bodies, but I am not always sure which ones have a body at all, let alone what goes in it. I suspect there may be some headers that are important too. Am I missing something? That draft says the VMs are created before the group. Is there a way today to create a VM without scheduling it? As I understand your draft, it lays out a three phase process for a client to follow: create resources without scheduling or activating them, then arrange them into groups, then schedule activate them. By activate I mean, for a VM instance, to start running it. That ordering must hold independently for each resource. Activations are invoked by the client in an order that is consistent with (a) runtime dependencies that are mediated directly by the client (e.g., string slinging in the heat engine) and (b) the nature of the resources (for example, you can not attach a volume to a VM instance until after both have been created). Other than those considerations, the ordering and/or parallelism is a degree of freedom available to the client. Have I got this right? Couldn't we simplify this into a two phase process: create groups and resources with scheduling, then activate the resources in an acceptable order? FYI: my group is using Weaver as the software orchestration technique, so there are no runtime dependencies that are mediated directly by the client. The client sees a very simple API: the client presents a definition of all the groups and resources, and the service first schedules it all then activates in an acceptable order. (We already have something in OpenStack that can do activations in an acceptable order, right?) Weaver is not the only software orchestration technique with this property. The simplicity of this API is one reason I recommend software orchestration techniques that take dependency mediation out of the client's hands. I hope that with coming work on HOT we can get OpenStack to this level of API simplicity. But that struggle lies farther down the roadmap... Thanks, Mike Yathiraj Udupi (yudupi) yud...@cisco.com wrote on 10/07/2013 11:10:20 PM: Hi, Based on the discussions we have had in the past few scheduler sub- team meetings, I am sharing a document that proposes an updated Instance Group Model and API extension model. This is a work-in-progress draft version, but sharing it for early feedback. https://docs.google.com/document/d/ 17OIiBoIavih-1y4zzK0oXyI66529f-7JTCVj-BcXURA/edit?usp=sharing This model support generic instance types, where an instance can represent a virtual node of any resource type. But in the context of Nova, an instance refers to the VM instance. This builds on the existing proposal for Instance Group Extension as documented here in this blueprint: https:// blueprints.launchpad.net/nova/+spec/instance-group-api-extension Thanks, Yathi. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft
In addition to the other questions below, I was wondering if you could explain why you included all those integer IDs; aren't the UUIDs sufficient? Thanks, Mike From: Mike Spreitzer/Watson/IBM@IBMUS To: Yathiraj Udupi (yudupi) yud...@cisco.com, Cc: OpenStack Development Mailing List openstack-dev@lists.openstack.org Date: 10/08/2013 12:41 AM Subject:Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft Thanks. I have a few questions. First, I am a bit stymied by the style of API documentation used in that document and many others: it shows the first line of an HTTP request but says nothing about all the other details. I am sure some of those requests must have interesting bodies, but I am not always sure which ones have a body at all, let alone what goes in it. I suspect there may be some headers that are important too. Am I missing something? That draft says the VMs are created before the group. Is there a way today to create a VM without scheduling it? As I understand your draft, it lays out a three phase process for a client to follow: create resources without scheduling or activating them, then arrange them into groups, then schedule activate them. By activate I mean, for a VM instance, to start running it. That ordering must hold independently for each resource. Activations are invoked by the client in an order that is consistent with (a) runtime dependencies that are mediated directly by the client (e.g., string slinging in the heat engine) and (b) the nature of the resources (for example, you can not attach a volume to a VM instance until after both have been created). Other than those considerations, the ordering and/or parallelism is a degree of freedom available to the client. Have I got this right? Couldn't we simplify this into a two phase process: create groups and resources with scheduling, then activate the resources in an acceptable order? FYI: my group is using Weaver as the software orchestration technique, so there are no runtime dependencies that are mediated directly by the client. The client sees a very simple API: the client presents a definition of all the groups and resources, and the service first schedules it all then activates in an acceptable order. (We already have something in OpenStack that can do activations in an acceptable order, right?) Weaver is not the only software orchestration technique with this property. The simplicity of this API is one reason I recommend software orchestration techniques that take dependency mediation out of the client's hands. I hope that with coming work on HOT we can get OpenStack to this level of API simplicity. But that struggle lies farther down the roadmap... Thanks, Mike Yathiraj Udupi (yudupi) yud...@cisco.com wrote on 10/07/2013 11:10:20 PM: Hi, Based on the discussions we have had in the past few scheduler sub- team meetings, I am sharing a document that proposes an updated Instance Group Model and API extension model. This is a work-in-progress draft version, but sharing it for early feedback. https://docs.google.com/document/d/ 17OIiBoIavih-1y4zzK0oXyI66529f-7JTCVj-BcXURA/edit?usp=sharing This model support generic instance types, where an instance can represent a virtual node of any resource type. But in the context of Nova, an instance refers to the VM instance. This builds on the existing proposal for Instance Group Extension as documented here in this blueprint: https:// blueprints.launchpad.net/nova/+spec/instance-group-api-extension Thanks, Yathi. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Climate] Questions and comments
Thanks, Dina. Yes, we do not understand each other; can I ask some more questions? You outlined a two-step reservation process (We assume the following reservation process for the OpenStack services...), and right after that talked about changing your mind to use Heat instead of individual services. So I am confused, I am not sure which of your remarks reflect your current thinking and which reflect old thinking. Can you just state your current thinking? On what basis would Climate decide to start or stop a lease? What sort of event notifications would Climate be sending, and when and why, and what would subscribers do upon receipt of such notifications? If the individual resource services continue to make independent scheduling decisions as they do today, what value does Climate add? Maybe a little more detailed outline of what happens in your current thinking, in support of an explicitly stated use case that shows the value, would help here. Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse (now featuring software orchestration)
FYI, I have refined my pictures at https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U and https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g to hopefully make it clearer that I agree with the sentiment that holistic infrastructure scheduling should not be part of heat but is closely related, and to make a graphical illustration of why I prefer the ordering of functionality that I do (the boundary between software and infrastructure issues gets less squiggly). Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [scheduler] blueprint for host/hypervisor location information
Maybe the answer is hiding in plain sight: host aggregates. This is a concept we already have, and it allows identification of arbitrary groupings for arbitrary purposes.___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] [heat] Policy specifics (for holistic infrastructure scheduling)
Clint Byrum cl...@fewbar.com wrote on 10/01/2013 02:38:53 AM: From: Clint Byrum cl...@fewbar.com To: openstack-dev openstack-dev@lists.openstack.org, Date: 10/01/2013 02:40 AM Subject: Re: [openstack-dev] [scheduler] [heat] Policy specifics (for holistic infrastructure scheduling) Mike, this has been really fun, but it is starting to feel like a rabbit hole. The case for having one feels legitimate. However, at this point, I think someone will need to actually build it, or the idea is just a pipe dream. Yes, Clint, I and colleagues are interested in building it. I think Debo and Yathi are too. And we are finding intersections with other projects. I am still new here and learning lots, so outputs have not come as fast as I had initially hoped. But I appreciate being able to have discussions and draw on the wisdom of the group. Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] [heat] Policy specifics (for holistic infrastructure scheduling)
OK, let's take the holistic infrastructure scheduling out of Heat. It really belongs at a lower level anyway. Think of it as something you slap on top of Nova, Cinder, Neutron, etc. and everything that is going to use them goes first through the holistic scheduler, to give it a chance to make some joint decisions. Zane has been worried about conflicting decisions being made, but if everything goes through the holistic infrastructure scheduling service then there does not need to be an issue with other parallel decision-making services (more on this below). For a public cloud, think of this holistic infrastructure scheduling as part of the service that the cloud offers to the public; the public says what it wants, and the various levels of schedulers work on delivering it; the internals are not exposed to the public. For example, a cloud user may say spread my cluster across at least two racks, not too unevenly; you do not want that public cloud customer to be in the business of knowing how many racks are in the cloud, knowing how much each one is currently being used, and picking which rack will contain which members of his cluster. For a private cloud, the holistic infrastructure scheduler should have the same humility as the lower schedulers: offer enough visibility and control to the clients that they can make decisions if they want to (thus, nobody needs to go around the holistic infrastructure scheduler if they already know what they want). You do not want to ask the holistic infrastructure scheduler to schedule resources one by one; you want to ask it to allocate a whole pattern/template/topology. There is thus no need for infrastructure orchestration prior to holistic infrastructure scheduling. Once the holistic infrastructure scheduler has done its job, there is a need for infrastructure orchestration. What should we use for that? OK, more on the business of conflicting decisions. For the sake of scalability and modularity, the holistic infrastructure scheduler should delegate as much decision-making as it can to more specific services. The job of the holistic infrastructure scheduler is to make joint decisions when there are strong interactions between services. You can fudge this either way (have the holistic infrastructure scheduler make more or less decisions than ideal), but if you want the best then I think the principle I stated is what would guide. So what if a delegated decision conflicts with a holistic decision? Don't do that. Divide the decision-making responsibilities into distinct domains, for example with the holistic scheduler making relatively big-picture decisions and individual resource services filling in the details. That said, there can still be nasty surprises from lower layers. Even if the design has carefully partitioned decision-making responsibilities, irregular things can still happen (e.g., authorized people can do something unexpected). Even if nothing intentionally does anything irregular, there remains the possibility of bugs. The holistic infrastructure scheduler should be prepared for nasty surprises, and getting information that is as authoritative as possible to begin with (promptness doesn't hurt either). Then there is the question of the scalability of the holistic infrastructure scheduler. One hard kernel of that is solving the optimization problem. Nobody should expect the scheduler to find the truly optimal solution; this is an NP-hard problem. However, there exist optimization algorithms that produce pretty good approximations in modest amounts of time. Additionally: if the patterns are small relative to the size of the whole zone being scheduled then it should be possible to do concurrent decision-making with optimistic concurrency control (as Clint has mentioned). You would not want one holistic infrastructure scheduler for a whole geographically distributed cloud. You could use a hierarchical arrangement, with one top-level decision-maker dividing a pattern between availability zones (by which I mean the sort of large independent domains that are typically known by that term) and then a subsidiary scheduler for each availability zone. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [scheduler] blueprint for host/hypervisor location information
Alex Glikson glik...@il.ibm.com wrote on 09/29/2013 03:30:35 PM: Mike Spreitzer mspre...@us.ibm.com wrote on 29/09/2013 08:02:00 PM: Another reason to prefer host is that we have other resources to locate besides compute. Good point. Another approach (not necessarily contradicting) could be to specify the location as a property of host aggregate rather than individual hosts (and introduce similar notion in Cinder, and maybe Neutron). This could be an evolution/generalization of the existing 'availability zone' attribute, which would specify a more fine-grained location path (e.g., 'az_A:rack_R1:chassis_C2:node_N3'). We briefly discussed this approach at the previous summit (see 'simple implementation' under https://etherpad.openstack.org/HavanaTopologyAwarePlacement) -- but unfortunately I don't think we made much progress with the actual implementation in Havana (would be good to fix this in Icehouse). Thanks for the background. I can still see the etherpad, but the old summit proposal to which it points is gone. The etherpad proposes an API, and leaves open the question of whether it backs onto a common service. I think that is a key question. In my own group's work, this sort of information is maintained in a shared database. I'm not sure what is the right approach for OpenStack. Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [scheduler] blueprint for host/hypervisor location information
Robert Collins robe...@robertcollins.net wrote on 09/29/2013 02:21:28 AM: Host not hypervisor I think; consider nova baremetal, where hypervisor == machine that runs tftpd and makes IPMI calls, and host == place where the user workload will execute. In nova baremetal, is there still a hypervisor in the picture, and is it necessarily the same machine as the host? Another reason to prefer host is that we have other resources to locate besides compute. But the current API maps a host to a list of uniformly-shaped contents, it is not obvious to me what would be a good way to extend this. Any ideas? Following is an example, it is the result of a GET on http://novahost:port/v2/tennantid/os-hosts/hostname 1. { 2. host: 3. [ 4. { 5. resource: 6. { 7. project: (total), 8. memory_mb: 96661, 9. host: x3630r7n8, 10. cpu: 32, 11. disk_gb: 2216 12. } 13. }, 14. { 15. resource: 16. { 17. project: (used_now), 18. memory_mb: 70144, 19. host: x3630r7n8, 20. cpu: 34, 21. disk_gb: 880 22. } 23. }, 24. { 25. resource: 26. { 27. project: (used_max), 28. memory_mb: 69632, 29. host: x3630r7n8, 30. cpu: 34, 31. disk_gb: 880 32. } 33. }, 34. { 35. resource: 36. { 37. project: 5e5e2b0da114499b838c8d24c31bea08, 38. memory_mb: 69632, 39. host: x3630r7n8, 40. cpu: 34, 41. disk_gb: 880 42. } 43. } 44. ] 45. }___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [scheduler] blueprint for host/hypervisor location information
Monty Taylor mord...@inaugust.com wrote on 09/29/2013 01:38:26 PM: On 09/29/2013 01:02 PM, Mike Spreitzer wrote: Robert Collins robe...@robertcollins.net wrote on 09/29/2013 02:21:28 AM: Host not hypervisor I think; consider nova baremetal, where hypervisor == machine that runs tftpd and makes IPMI calls, and host == place where the user workload will execute. In nova baremetal, is there still a hypervisor in the picture, and is it necessarily the same machine as the host? There is one or more machiens where nova-compute runs. Those machines are necessarily _not_ the same machine as the host. So the host is the bare metal machine where the user's image is instantiated and run, and some other machine runs nova-compute to help set that up, right? When I do a GET on http://novahost:port/v2/tenantid/servers/instanceid today, in a Grizzly installation running VM instances on KVM hypervisors, I get back a bunch of attributes, including hostId (whose value is a long hex string), OS-EXT-SRV-ATTR:host (whose value is a short name), and OS-EXT-SRV-ATTR:hypervisor_hostname (whose value is the same short name as OS-EXT-SRV-ATTR). Those short names are the same ones appearing in the reply to a GET of http://novahost:port/v2/tenantid/os-hosts and also in the reply to a GET of http://novahost:port/v2/tenantid/os-hypervisors. In the case of baremetal, will a GET of http://novahost:port/v2/tenantid/os-hypervisors return things related to baremetal and, if so, which ones (the nova-compute machines or the hosts)? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] [scheduler] blueprint for host/hypervisor location information
I have begun drafting a blueprint about more detailed host/hypervisor location information, to support the sort of policy-informed placement decision-making that Debo, Yathi, and I have been talking about. The blueprint is at https://blueprints.launchpad.net/nova/+spec/hypervisor-location-attribute and the details are at https://wiki.openstack.org/wiki/Nova/HypervisorExtendedAttributes You see I am a little schizophrenic here about scope. The blueprint is named quite narrowly, and the details page is named more broadly; this is because I am not sure what else you folks will chime in with. I am not sure whether this information should really be attached to a hypervisor or to a host. I proposed hypervisor because currently the details for a hypervisor are a map (easily extended) whereas the details for a host are currently a list of uniformly-typed contents (not so easily extended). But host might actually be more appropriate. I am looking for feedback on whether/how to go that way. BTW, where would I find documentation on host details? The current page on nova extensions ( http://docs.openstack.org/api/openstack-compute/2/content/ext-compute.html ) is lacking most of them. You will see that I have proposed what the API looks like, but not the source of this additional information. I will ask my colleagues who have something like this locally, how they got it done and what they would recommend to OpenStack. Perhaps you good folks have some suggestions. Is there obviously one way to do it? Is it obvious that there can be no one way and so a plug point is required? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [scheduler] [heat] Policy specifics
I have begun to draft some specifics about the sorts of policies that might be added to infrastructure to inform a smart unified placement engine. These are cast as an extension to Heat templates. See https://wiki.openstack.org/wiki/Heat/PolicyExtension . Comments solicited. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] [heat] Policy specifics
Stephen Gran stephen.g...@theguardian.com wrote on 09/27/2013 04:26:37 AM: Maybe I'm missing something obvious, but I'm not convinced all that logic belongs in Heat. I would expect nova and related components to expose grouping information (availability zones in nova, networks in quantum, etc) and for end users to supply the group by information. Yes, this additional policy information is not intended to inform infrastructure orchestration. It is intended to inform something that I have been calling holistic infrastructure scheduling and others have called things like unified resource placement and smart resource placement. I frame it as an extension to Heat templates because this policy information needs to be added to a statement about a whole pattern/template/topology and Heat templates are the language we have for such things. The idea is that holistic infrastructure scheduling comes before infrastructure orchestration; by the time infrastructure orchestration happens, the policy information has been handled and removed (or, possibly, encapsulated in some way for downstream processing --- but that's another story I am not trying to broach yet). I have been discussing this outline here under the subject Bringing things together for Icehouse ( http://lists.openstack.org/pipermail/openstack-dev/2013-September/015118.html ), in the scheduler subgroup and heat weekly IRC meetings, and have a design summit proposal (http://summit.openstack.org/cfp/details/113). I think that your use case for anti-collocation (which is a very good and important use case, don't get me wrong) is covered by using availability zones/cells/regions and so on as they are, and doesn't require much logic internal to Heat beyond obeying the constraint specified by a user. If there are five racks in the system and I want to say that two VMs should be placed on different racks, how do I say that with AZs without being overly specific? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse (now featuring software orchestration)
Zane Bitter zbit...@redhat.com wrote on 09/27/2013 08:24:49 AM: Your diagrams clearly show scheduling happening in a separate stage to (infrastructure) orchestration, which is to say that at the point where resources are scheduled, their actual creation is in the *future*. I am not a Climate expert, but it seems to me that they have a near-identical problem to solve: how do they integrate with Heat such that somebody who has reserved resources in the past can actually create them (a) as part of a Heat stack or (b) as standalone resources, at the user's option. IMO OpenStack should solve this problem only once. If I understand correctly, what Climate adds to the party is planning allocations to happen at some specific time in the non-immediate future. A holistic infrastructure scheduler is planning allocations to happen just as soon as we can get the plans through the relevant code path, which is why I describe it as now. If I understood your remarks correctly, we agree that there is no (known) reason that the scheduling has to occur in the middle of orchestration (which would have implied that it needed to be incorporated in some sense into Heat). If you agree that by orchestration you meant specifically infrastructure orchestration then we are agreed. If software orchestration is also in the picture then I also agree that holistic infrastructure scheduling does not *have to* go in between software orchestration and infrastructure orchestration --- but I think that's a pretty good place for it. Right, so what I'm saying is that if all those things are _stated_ in the input then there's no need to run the orchestration engine to find out what they'll be; they're already stated. Yep. Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse (now featuring software orchestration)
Sorry, I was a bit too hasty in writing the last part of my last message; I forgot to qualify software orchestration to indicate I am speaking only of its preparatory phase. I should have written: Zane Bitter zbit...@redhat.com wrote on 09/27/2013 08:24:49 AM: ... If I understood your remarks correctly, we agree that there is no (known) reason that the scheduling has to occur in the middle of orchestration (which would have implied that it needed to be incorporated in some sense into Heat). If you agree that by orchestration you meant specifically infrastructure orchestration then we are agreed. If software orchestration preparation is also in the picture then I also agree that holistic infrastructure scheduling does not *have to* go in between software orchestration preparation and infrastructure orchestration --- but I think that's a pretty good place for it. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] [heat] Policy specifics
Stephen Gran stephen.g...@theguardian.com wrote on 09/27/2013 10:46:09 AM: If the admins of the openstack install wanted users to be able to select placement by rack, surely the availability zones would be rack1 - rack5 ? In this case, the user would write: Resources : { MyASG : { Type : AWS::AutoScaling::AutoScalingGroup, Properties : { AvailabilityZones : { Fn::GetAZs : }, MinSize : 2, DesiredSize: 2, MaxSize : 2, } }, This should naturally implement placement as spread evenly across AZs. You have added that DesiredSize property, to convey the idea of spreading across at least some number (2 in this case) of AZs, right? That is, it is not functionality in today's Nova, rather something we could add. What if the cloud in question has several levels of structure available, and the admins want users to be able to spread at any of the available levels? I think maybe this is where I think my disagreement is. Heat should be able to express a user preference for placement, but only within the bounds of the policy already created by the admins of the nova install. To have Heat have more knowledge than what is available via the nova API seems overcomplicated and fragile to me. If the nova API should grow some extensions to make more complicated placement algorithms available, then that's an argument that might have legs. I am trying to find a way to introduce holistic infrastructure scheduling, whose purpose in life is to do what Nova, Cinder, etc can not do on their own. (Making Nova, Cinder, and friends more capable on their own is also good, and complements what I am advocating.) Yes, this requires some more visibility and control from those individual services. Those needs have come up in vague ways in the discussion so far, and I plan to write something specific. I am distinctly NOT advocating mixing holistic infrastructure scheduling up with infrastructure orchestration. I think holistic infrastructure scheduling happens prior to infrastructure orchestration. The more debatable question is the ordering with respect to the preparatory stage of software orchestration. Another complicating factor is what happens when the infrastructure orchestration calls out to something that makes a nested stack. Regards, Mike ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [scheduler] [heat] Policy specifics
Clint Byrum cl...@fewbar.com wrote on 09/27/2013 11:58:16 AM: From: Clint Byrum cl...@fewbar.com To: openstack-dev openstack-dev@lists.openstack.org, Date: 09/27/2013 12:01 PM Subject: Re: [openstack-dev] [scheduler] [heat] Policy specifics ... Mike, These are not the kinds of specifics that are of any help at all in figuring out how (or, indeed, whether) to incorporate holistic scheduling into OpenStack. I agree that the things in that page are a wet dream of logical deployment fun. However, I think one can target just a few of the basic ones, and see a real achievable case forming. I think I grasp Mike's ideas, so I'll respond to your concerns with what I think. Note that it is highly likely I've gotten some of this wrong. It remains to be seen whether those things can be anything more than a wet dream for OpenStack, but they are running code elsewhere, so I have hope. What I wrote is pretty much a dump of what we have. The exception is the network bandwidth stuff, which our holistic infrastructure scheduler currently ignores because we do not have a way to get the relevant capacity information from the physical infrastructure. Part of the agenda here is to nudge Neutron to improve in that way. - What would a holistic scheduling service look like? A standalone service? Part of heat-engine? I see it as a preprocessor of sorts for the current infrastructure engine. It would take the logical expression of the cluster and either turn it into actual deployment instructions or respond to the user that it cannot succeed. Ideally it would just extend the same Heat API. My own expectation is that it would be its own service, preceding infrastructure orchestration in the flow. Alternatively, we could bundle holistic infrastructure scheduling, infrastructure orchestration, and software orchestration preparation together under one API but still maintained as fairly separate modules of functionality. Or various in between ideas. I do not yet have a strong reason for one choice over another. I have been looking to gain cluefulness from discussion with you folks. - How will the scheduling service reserve slots for resources in advance of them being created? How will those reservations be accounted for and billed? - In the event that slots are reserved but those reservations are not taken up, what will happen? I dont' see the word reserve in Mike's proposal, and I don't think this is necessary for the more basic models like Collocation and Anti-Collocation. Reservations would of course make the scheduling decisions more likely to succeed, but it isn't necessary if we do things optimistically. If the stack create or update fails, we can retry with better parameters. The raw truth of the matter is that even Nova has this problem already. The real ground truth of resource usage is in the hypervisor, not Nova. When Nova makes a decision, it really is provisional until confirmed by the hypervisor. I have heard of cases, in different cloud software, where the thing making the placement decisions does not have a truly accurate picture of the resource usage. These are typically caused by corner cases in failure scenarios, where the decision maker thinks something did not happen or was successfully deleted but in reality there is a zombie left over consuming some resources in the hypervisor. There are probably cases where this can happen in OpenStack too, I am guessing. Also, OpenStack does not prevent someone from going around Nova and directly asking a hypervisor to do something. - Once scheduled, how will resources be created in their proper slots as part of a Heat template? In goes a Heat template (sorry for not using HOT.. still learning it. ;) Resources: ServerTemplate: Type: Some::Defined::ProviderType HAThing1: Type: OS::Heat::HACluster Properties: ClusterSize: 3 MaxPerAZ: 1 PlacementStrategy: anti-collocation Resources: [ ServerTemplate ] And if we have at least 2 AZ's available, it feeds to the heat engine: Resources: HAThing1-0: Type: Some::Defined::ProviderType Parameters: availability-zone: zone-A HAThing1-1: Type: Some::Defined::ProviderType Parameters: availability-zone: zone-B HAThing1-2: Type: Some::Defined::ProviderType Parameters: availability-zone: zone-A If not, holistic scheduler says back I don't have enough AZ's to satisfy MaxPerAZ. Actually, I was thinking something even simpler (in the simple cases :-). By simple cases I mean where the holistic infrastructure scheduler makes all the placement decisions. In that case, it only needs to get Nova to implement the decisions already made. So the API call or template fragment for a VM instance would include an AZ parameter that specifies the particular host already chosen for that VM instance. Similarly for
Re: [openstack-dev] [scheduler] [heat] Policy specifics
Zane also raised an important point about value. Any scheduler is serving one master most directly, the cloud provider. Any sane cloud provider has some interest in serving the interests of the cloud users, as well as having some concerns of its own. The way my group has resolved this is in the translation from the incoming requests to the underlying optimization problem that is solved for placement; in that translation we fold in the cloud provider's interests as well as the cloud user's. We currently have a fixed opinion of the cloud provider's interests; generalizing that is a possible direction for future progress. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Generalising racks :- modelling a datacentre
I agree that such a thing is useful for scheduling. I see a bit of a tension here: for software engineering reasons we want some independence, but we also want to avoid wasteful duplication. I think we are collectively backing into the problem of metamodeling for datacenters, and establishing one or more software thingies that will contain/communicate datacenter models. A collection of nodes annotated with tags is a metamodel. You could define a graph-based metamodel without mandating any particular graph shape. You could be more prescriptive and mandate a tree shape as a good compromise between flexibility and making something that is reasonably easy to process. We can debate what the metamodel should be, but that is different from debating whether there is a metamodel. Regards, Mike From: Tomas Sedovic tsedo...@redhat.com To: openstack-dev@lists.openstack.org, Date: 09/25/2013 10:37 AM Subject:Re: [openstack-dev] [TripleO] Generalising racks :- modelling a datacentre On 09/25/2013 05:15 AM, Robert Collins wrote: One of the major things Tuskar does is model a datacenter - which is very useful for error correlation, capacity planning and scheduling. Long term I'd like this to be held somewhere where it is accessible for schedulers and ceilometer etc. E.g. network topology + switch information might be held by neutron where schedulers can rely on it being available, or possibly held by a unified topology db with scheduler glued into that, but updated by neutron / nova / cinder. Obviously this is a) non-trivial and b) not designed yet. However, the design of Tuskar today needs to accomodate a few things: - multiple reference architectures for clouds (unless there really is one true design) - the fact that today we don't have such an integrated vertical scheduler. So the current Tuskar model has three constructs that tie together to model the DC: - nodes - resource classes (grouping different types of nodes into service offerings - e.g. nodes that offer swift, or those that offer nova). - 'racks' AIUI the initial concept of Rack was to map to a physical rack, but this rapidly got shifted to be 'Logical Rack' rather than physical rack, but I think of Rack as really just a special case of a general modelling problem.. Yeah. Eventually, we settled on Logical Rack meaning a set of nodes on the same L2 network (in a setup where you would group nodes into isolated L2 segments). Which kind of suggests we come up with a better name. I agree there's a lot more useful stuff to model than just racks (or just L2 node groups). From a deployment perspective, if you have two disconnected infrastructures, thats two AZ's, and two underclouds : so we know that any one undercloud is fully connected (possibly multiple subnets, but one infrastructure). When would we want to subdivide that? One case is quick fault aggregation: if a physical rack loses power, rather than having 16 NOC folk independently investigating the same 16 down hypervisors, one would prefer to identify that the power to the rack has failed (for non-HA powered racks); likewise if a single switch fails (for non-HA network topologies) you want to identify that that switch is down rather than investigating all the cascaded errors independently. A second case is scheduling: you may want to put nova instances on the same switch as the cinder service delivering their block devices, when possible, or split VM's serving HA tasks apart. (We currently do this with host aggregates, but being able to do it directly would be much nicer). Lastly, if doing physical operations like power maintenance or moving racks around in a datacentre, being able to identify machines in the same rack can be super useful for planning, downtime announcements, orhttps://plus.google.com/hangouts/_/04919b4400b8c4c5ba706b752610cd433d9acbe1 host evacuation, and being able to find a specific machine in a DC is also important (e.g. what shelf in the rack, what cartridge in a chassis). I agree. However, we should take care not to commit ourselves to building a DCIM just yet. Back to 'Logical Rack' - you can see then that having a single construct to group machines together doesn't really support these use cases in a systematic fasion:- Physical rack modelling supports only a subset of the location/performance/failure use cases, and Logical rack doesn't support them at all: we're missing all the rich data we need to aggregate faults rapidly : power, network, air conditioning - and these things cover both single machine/groups of machines/racks/rows of racks scale (consider a networked PDU with 10 hosts on it - thats a fraction of a rack). So, what I'm suggesting is that we model the failure and performance domains directly, and include location (which is the incremental data racks add once failure and performance domains are modelled) too. We can separately noodle on
Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse (now featuring software orchestration)
Clint wrote: There is a third stealth-objective that CFN has caused to linger in Heat. That is packaging cloud applications. By allowing the 100% concrete CFN template to stand alone, users can ship the template. IMO this marrying of software assembly, config, and orchestration is a concern unto itself, and best left outside of the core infrastructure orchestration system. I favor separation of concerns. I do not follow what you are suggesting about how to separate these particular concerns. Can you elaborate? Clint also wrote: A ruby DSL is not something I think is ever going to happen in OpenStack. Ruby is particularly good when the runtime scripting is done through chef or puppet, which are based on Ruby. For example, Weaver supports chef based scripting, and integrates in a convenient way. A distributed system does not all have to be written in the same language. Thomas wrote: I don't fully get this idea of HOT consuming a monolithic model produced by some compiler - be it Weaver or anything else. I thought the goal was to develop HOT in a way that users can actually write HOT, as opposed to having to use some compiler to produce some useful model. So wouldn't it make sense to make sure we add the right concepts to HOT to make sure we are able to express what we want to express and have things like composability, re-use, substitutability? I am generally suspicious of analogies, but let me offer one here. In the realm of programming languages, many have great features for modularity within one source file. These features are greatly appreciated and used. But that does not stop people from wanting to maintain sources factored into multiple files. Back to the world at hand, I do not see a conflict between (1) making a language for monoliths with sophisticated internal structure and (2) defining one or more languages for non-monolithic sources. Thomas wrote: As said in my comment above, I would like to see us focusing on the agreement of one language - HOT - instead of yet another DSL. There are things out there that are well established (like chef or puppet), and HOT should be able to efficiently and intuitively use those things and orchestrate components built using those things. Yes, it may be that our best tactic at this point is to allow multiple (2), some or all not defined through the OpenStack Foundation, while agreeing here on (1). Thomas wrote: Anyway, this might be off the track that was originally discussed in this thread (i.e. holistic scheduling and so on) ... We are engaged in a boundary-drawing and relationship-drawing exercise. I brought up this idea of a software orchestration compiler to show why I think the software orchestration preparation stage is best done earlier rather than later. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse
Debo, Yathi: I have read https://docs.google.com/document/d/1IiPI0sfaWb1bdYiMWzAAx0HYR6UqzOan_Utgml5W1HI/edit?pli=1 and most of the referenced materials, and I have a couple of big-picture questions. That document talks about making Nova call out to something that makes the sort of smart decisions you and I favor. As far as I know, Nova is still scheduling one thing at a time. How does that smart decision maker get a look at the whole pattern/termplate/topology as soon as it is needed? I think you intend the smart guy gets it first, before Nova starts getting individual VM calls, right? How does this picture grow to the point where the smart guy is making joint decisions about compute, storage, and network? I think the key idea has to be that the smart guy gets a look at the whole problem first, and makes its decisions, before any individual resources are requested from nova/cinder/neutron/etc. I think your point about non-disruptive, works with the current nova architecture is about solving the problem of how the smart guy's decisions get into nova. Presumably this problem will occur for cinder and so on, too. Have I got this right? There is another way, right? Today Nova accepts an 'availability zone' argument whose value can specify a particular host. I am not sure about Cinder, but you can abuse volume types to get this job done. Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Basic questions about climate
Climate is about reserving resources. Are those physical resources or virtual ones? Where was I supposed to read the answer to basic questions like that? If climate is about reserving virtual resources, how is that different from scheduling them? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse (now featuring software orchestration)
Let me elaborate a little on my thoughts about software orchestration, and respond to the recent mails from Zane and Debo. I have expanded my picture at https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U and added a companion picture at https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g that shows an alternative. One of the things I see going on is discussion about better techniques for software orchestration than are supported in plain CFN. Plain CFN allows any script you want in userdata, and prescription of certain additional setup elsewhere in cfn metadata. But it is all mixed together and very concrete. I think many contributors would like to see something with more abstraction boundaries, not only within one template but also the ability to have modular sources. I work closely with some colleagues who have a particular software orchestration technology they call Weaver. It takes as input for one deployment not a single monolithic template but rather a collection of modules. Like higher level constructs in programming languages, these have some independence and can be re-used in various combinations and ways. Weaver has a compiler that weaves together the given modules to form a monolithic model. In fact, the input is a modular Ruby program, and the Weaver compiler is essentially running that Ruby program; this program produces the monolithic model as a side effect. Ruby is a pretty good language in which to embed a domain-specific language, and my colleagues have done this. The modular Weaver input mostly looks declarative, but you can use Ruby to reduce the verboseness of, e.g., repetitive stuff --- as well as plain old modularity with abstraction. We think the modular Weaver input is much more compact and better for human reading and writing than plain old CFN. This might not be obvious when you are doing the hello world example, but when you get to realistic examples it becomes clear. The Weaver input discusses infrastructure issues, in the rich way Debo and I have been advocating, as well as software. For this reason I describe it as an integrated model (integrating software and infrastructure issues). I hope for HOT to evolve to be similarly expressive to the monolithic integrated model produced by the Weaver compiler. In Weaver, as well as in some of the other software orchestration technologies being discussed, there is a need for some preparatory work before the infrastructure (e.g., VMs) is created. This preparatory stage begins the implementation of the software orchestration abstractions. Here is the translation from something more abstract into flat userdata and other cfn metadata. For Weaver, this stage also involves some stack-specific setup in a distinct coordination service. When the VMs finally run their userdata, the Weaver-generated scripts there use that pre-configured part of the coordination service to interact properly with each other. I think that, to a first-order approximation, the software orchestration preparatory stage commutes with holistic infrastructure scheduling. They address independent issues, and can be done in either order. That is why I have added a companion picture; the two pictures show the two orders. My claim of commutativity is limited, as I and colleagues have demonstrated only one of the two orderings; the other is just a matter of recent thought. There could be gotchas lurking in there. Between the two orderings, I have a preference for the one I first mentioned and have experience with actually running. It has the virtue of keeping related things closer together: the software orchestration compiler is next to the software orchestration preparatory stage, and the holistic infrastructure scheduling is next to the infrastructure orchestration. In response to Debo's remark about flexibility: I am happy to see an architecture that allows either ordering if it turns out that they are both viable and the community really wants that flexibility. I am not so sure we can totally give up on architecting where things go, but this level of flexibility I can understand and get behind (provided it works). Just as a LP solver is a general utility whose uses do not require architecting, I can imagine a higher level utility that solves abstract placement problems. Actually, this is not a matter of imagination. My group has been evolving such a thing for years. It is now based, as Debo recommends, on a very flexible and general optimization algorithm. But the plumbing between it and the rest of the system is significant; I would not expect many users to take on that magnitude of task. I do not really want to get into dogmatic fights over what gets labelled heat. I will leave the questions about which piece goes where in the OpenStack programs and projects to those more informed and anointed. What I am trying to
Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse
I was not trying to raise issues of geographic dispersion and other higher level structures, I think the issues I am trying to raise are relevant even without them. This is not to deny the importance, or relevance, of higher levels of structure. But I would like to first respond to the discussion that I think is relevant even without them. I think it is valuable for OpenStack to have a place for holistic infrastructure scheduling. I am not the only one to argue for this, but I will give some use cases. Consider Hadoop, which stresses the path between Compute and Block Storage. In the usual way of deploying and configuring Hadoop, you want each data node to be using directly attached storage. You could address this by scheduling one of those two services first, and then the second with constraints from the first --- but the decisions made by the first could paint the second into a corner. It is better to be able to schedule both jointly. Also consider another approach to Hadoop, in which the block storage is provided by a bank of storage appliances that is equidistant (in networking terms) from all the Compute. In this case the Storage and Compute scheduling decisions have no strong interaction --- but the Compute scheduling can interact with the network (you do not want to place Compute in a way that overloads part of the network). Once a holistic infrastructure scheduler has made its decisions, there is then a need for infrastructure orchestration. The infrastructure orchestration function is logically downstream from holistic scheduling. I do not favor creating a new and alternate way of doing infrastructure orchestration in this position. Rather I think it makes sense to use essentially today's heat engine. Today Heat is the only thing that takes a holistic view of patterns/topologies/templates, and there are various pressures to expand the mission of Heat. A marquee expansion is to take on software orchestration. I think holistic infrastructure scheduling should be downstream from the preparatory stage of software orchestration (the other stage of software orchestration is the run-time action in and supporting the resources themselves). There are other pressures to expand the mission of Heat too. This leads to conflicting usages for the word heat: it can mean the infrastructure orchestration function that is the main job of today's heat engine, or it can mean the full expanded mission (whatever you think that should be). I have been mainly using heat in that latter sense, but I do not really want to argue over naming of bits and assemblies of functionality. Call them whatever you want. I am more interested in getting a useful arrangement of functionality. I have updated my picture at https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U --- do you agree that the arrangement of functionality makes sense? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse
Someone earlier asked for greater clarity about infrastructure orchestration, so here is my view. I see two main issues: (1) deciding the order in which to do things, and (2) doing them in an acceptable order. That's an oversimplified wording because, in general, some parallelism is possible. In general, the set of things to do is constrained by a partial order --- and that partial order comes from two sources. One is the nature of the downstream APIs. For examples, you can not attach a volume or floating IP address to a VM until after both have been created. The other source of ordering constraints is upstream decision makers. Decisions made upstream are conveyed into today's heat engine by data dependencies between resources in a heat template. The heat engine is not making those decisions. It is not a source of important ordering constraints. When the ordering constraints actually allow some parallelism --- they do not specify a total order --- the heat engine has freedom in which of that parallelism to exploit vs flatten into sequential ordering. What today's heat engine does is make its available choices about that and issue the operations, keeping track of IDs and outcomes. I have been using the term infrastructure orchestration to refer to this latter job (issuing infrastructure operations with acceptable ordering/parallelism), not the decision-making of upstream agents. This might be confusing; I think the plain English meaning of orchestration suggests decision-making as well as execution. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Medium Availability VMs
From: Tim Bell tim.b...@cern.ch ... Is this something that will be added into OpenStack or made available as open source through something like stackforge ? I and some others think that the OpenStack architecture should have a place for holistic infrastructure scheduling. I also think this is an area where vendors will want to compete; I think my company has some pretty good technology for this and will want to sell it for money. https://wiki.openstack.org/wiki/Open requires that the free OpenStack includes a pretty good implementation of this function too, and I think others have some they want to contribute. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse
I have written a new outline of my thoughts, you can find it at https://docs.google.com/document/d/1RV_kN2Io4dotxZREGEks9DM0Ih_trFZ-PipVDdzxq_E It is intended to stand up better to independent study. However, it is still just an outline. I am still learning about stuff going on in OpenStack, and am learning and thinking faster than I can write. Trying to figure out how to cope. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Fwd: [Openstack-devel] PGP key signing party during the HK summit
What's the threat model here? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] How the autoscale API should control scaling in Heat
I'd like to try to summarize this discussion, if nothing else than to see whether I have correctly understood it. There is a lot of consensus, but I haven't heard from Adrian Otto since he wrote some objections. I'll focus on trying to describe the consensus; Adrian's concerns are already collected in a single message. Or maybe this is already written in some one place? The consensus is that there should be an autoscaling (AS) service that is accessible via its own API. This autoscaling service can scale anything describable by a snippet of Heat template (it's not clear to me exactly what sort of syntax this is; is it written up anywhere?). The autoscaling service is stimulated into action by a webhook call. The user has the freedom to arrange calls on that webhook in any way she wants. It is anticipated that a common case will be alarms raised by Ceilometer. For more specialized or complicated logic, the user is free to wire up anything she wants to call the webhook. An instance of the autoscaling service maintains an integer variable, which is the current number of copies of the thing being autoscaled. Does the webhook call provide a new number, or +1/-1 signal, or ...? There was some discussion of a way to indicate which individuals to remove, in the case of decreasing the multiplier. I suppose that would be an option in the webhook, and one that will not be exercised by Ceilometer alarms. (It seems to me that there is not much auto in this autoscaling service --- it is really a scaling service driven by an external controller. This is not a criticism, I think this is a good factoring --- but maybe not the best naming.) The autoscaling service does its job by multiplying the heat template snippet (the thing to be autoscaled) by the current number of copies and passing this derived template to Heat to make it so. As the desired number of copies changes, the AS service changes the derived template that it hands to Heat. Most commentators argue that the consistency and non-redundancy of making the AS service use Heat outweigh the extra path-length compared to a more direct solution. Heat will have a resource type, analogous to AWS::AutoScaling::AutoScalingGroup, through which the template author can request usage of the AS service. OpenStack in general, and Heat in particular, need to be much better at traceability and debuggability; the AS service should be good at these too. Have I got this right? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Heat] How the autoscale API should control scaling in Heat
radix, thanks. How exactly does the cooldown work? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Medium Availability VMs
From: Tim Bell tim.b...@cern.ch ... Discussing with various people in the community, there seems to be interest in a way to - Identify when a hypervisor is being drained or is down and inventory its VMs - Find the best practise way of restarting that VM for hypervisors still available o Live migration o Cold migration - Defining policies for the remaining cases o Restart from base image o Suspend o Delete This touches multiple components from Nova/Cinder/Quantum (at minimum). It also touches some cloud architecture questions if OpenStack can start to move into the low hanging fruit parts of service consolidation. I’d like to have some form of summit discussion in Hong Kong around these topics but it is not clear where it fits. Are there others who feel similarly ? How can we fit it in ? When there are multiple viable choices, I think direction should be taken from higher layers. The operation of draining a hypervisor can be parameterized, the VMs themselves can be tagged, by an indication of which to do. I myself am working primarily on holistic infrastructure scheduling, which includes quiescing and draining hypervisors among the things it can do. Holistic scheduling works under the direction of a template/pattern/topology that describes a set of interacting resources and their relationships, and so is able to make a good decision about where VMs should move to. Re-starting a VM can require software coordination. I think holistic infrastructure scheduling is logically downstream from software coordination and upstream from infrastructure orchestration. I think the ambitions for Heat are expanding to include the latter two, and so must also have something to do with holistic infrastructure scheduling. Regards, Mike ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse
Manuel, and others: I am sorry, in the rush at the end of the scheduler meeting a critical fact flew from my mind: the material I distributed beforehand was intended as something I could reference during discussion in the meeting, I did not expect it to fully stand on its own. Indeed, you have noticed that it does not. It will take a little more time to write something that stands on its own. I will try to get something out soon, including answers to your questions. I should also make clear the overall sense of what I am doing. I am in an in-between state. My group has some running code on which I can report, but we are not satisfied with it for a few reasons. One is that it is not integrated yet in any way with Heat, and I think the discussion we are having here overlaps with Heat. Another is that it does not support very general changes, we have so far been solving initial deployment issues. We have been thinking about how to do better on these issues, and have an outline and are proceeding with the work; I can report on these too. The things that concern me the most are issues of how to get architectural alignment with what the OpenStack community is doing. So my main aim right now is to have a discussion of how the pieces fit together. I am told that the OpenStack community likes to focus on small incremental changes, and that is a way to get things done, but I, at least, would like to get some sense of where this is going. Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [heat] cross-stack references
When we get into things like affinity concerns or managing network bandwidth, we see the need for cross-stack relationships. You may want to place parts of a new stack near parts of an existing one, for example. I see that in CFN you can make cross-references between different parts of a single stack using the resource names that appear in the original template. Is there a way to refer to something that did not come from the same original template? If not, won't we need such a thing to be introduced? Any thoughts on how that would be done? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] cross-stack references
My question is about stacks that are not nested. Suppose, for example, that I create a stack that implements a shared service. Later I create a separate stack that uses that shared service. When creating that client stack, I would like to have a way of talking about its relationships with the service stack. Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse
Fixed, sorry. From: Gary Kotton gkot...@vmware.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 09/17/2013 03:26 AM Subject:Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse Hi, The document is locked. Thanks Gary From: Mike Spreitzer mspre...@us.ibm.com Reply-To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Date: Tuesday, September 17, 2013 8:00 AM To: OpenStack Development Mailing List openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse I have written a brief document, with pictures. See https://docs.google.com/document/d/1hQQGHId-z1A5LOipnBXFhsU3VAMQdSe-UXvL4VPY4ps Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Change email address
Is it possible to change the email address I use in git and gerrit? I think I started off with an inferior choice. I have now taught LaunchPad and Gerrit that I have two email addresses. The OpenStack Foundation appears a bit confused, but I'm hoping that's not critical. I am stuck at the point on https://wiki.openstack.org/wiki/How_To_Contribute where it says, concerning signing the ICLA, Your full name and E-mail address will be public (...) and the latter needs to match the user.email in your Git configuration. Gerrit knows that I have signed the ICLA, and will not let me sign it again (I can not even try, it is grayed out). Would it be correct to clarify the text I quoted above to say that one of your Gerrit email addresses has to match the one in your Git configuration? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Change email address
Thanks Anne. Since I have already signed the ICLA, my real question is about what has to be true on an on-going basis for me to do developer stuff like reviewing and submitting patches. Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Change email address, or, why I can't use github and will I be able to submit patches?
I am working through the instructions at https://wiki.openstack.org/wiki/GerritWorkflow - and things are going OK, including installing ~/.ssh/id_rsa.pub at https://review.openstack.org/#/settings/ssh-keys, without any linebreaks in the middle nor at the end - except it fails at the point where I test my ability to use github: mjs9:~ mspreitz$ git config --list user.name=Mike user.email=mspre...@us.ibm.com core.editor=emacs mjs9:~ mspreitz$ ssh -T g...@github.com Warning: Permanently added the RSA host key for IP address '192.30.252.131' to the list of known hosts. Permission denied (publickey). What's going wrong here? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Change email address, or, why I can't use github and will I be able to submit patches?
From: Anne Gentle annegen...@justwriteclick.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 09/17/2013 05:51 PM Subject: Re: [openstack-dev] Change email address, or, why I can't use github and will I be able to submit patches? ... Github was experiencing issues earlier today. Nothing in our GerritWorkflow requires ssh -T g...@github.com though. If you were able to do a git clone, how did git review -s go for you? Both work. So I guess I am in business. Thanks! ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [savanna] Program name and Mission statement
data processing is surely a superset of big data. Either, by itself, is way too vague. But the wording that many people favor, which I will quote again, uses the vague term in a qualified way that makes it appropriately specific, IMHO. Here is the wording again: ``To provide a simple, reliable and repeatable mechanism by which to deploy Hadoop and related Big Data projects, including management, monitoring and processing mechanisms driving further adoption of OpenStack.'' I think that saying related Big Data projects after Hadoop is fairly clear. OTOH, I would not mind replacing Hadoop and related Big Data projects with the Hadoop ecosystem. Regards, Mike Matthew Farrellee m...@redhat.com wrote on 09/16/2013 02:39:20 PM: From: Matthew Farrellee m...@redhat.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 09/16/2013 02:40 PM Subject: Re: [openstack-dev] [savanna] Program name and Mission statement IMHO, Big Data is even more nebulous and currently being pulled in many directions. Hadoop-as-a-Service may be too narrow. So, something in between, such as Data Processing, is a good balance. Best, matt On 09/13/2013 08:37 AM, Abhishek Lahiri wrote: IMHO data processing is too board , it makes more sense to clarify this program as big data as a service or simply openstack-Hadoop-as-a-service. Thanks Regards Abhishek Lahiri On Sep 12, 2013, at 9:13 PM, Nirmal Ranganathan rnir...@gmail.com mailto:rnir...@gmail.com wrote: On Wed, Sep 11, 2013 at 8:39 AM, Erik Bergenholtz ebergenho...@hortonworks.com mailto:ebergenho...@hortonworks.com wrote: On Sep 10, 2013, at 8:50 PM, Jon Maron jma...@hortonworks.com mailto:jma...@hortonworks.com wrote: Openstack Big Data Platform On Sep 10, 2013, at 8:39 PM, David Scott david.sc...@cloudscaling.com mailto:david.sc...@cloudscaling.com wrote: I vote for 'Open Stack Data' On Tue, Sep 10, 2013 at 5:30 PM, Zhongyue Luo zhongyue@intel.com mailto:zhongyue@intel.com wrote: Why not OpenStack MapReduce? I think that pretty much says it all? On Wed, Sep 11, 2013 at 3:54 AM, Glen Campbell g...@glenc.io mailto:g...@glenc.io wrote: performant isn't a word. Or, if it is, it means having performance. I think you mean high-performance. On Tue, Sep 10, 2013 at 8:47 AM, Matthew Farrellee m...@redhat.com mailto:m...@redhat.com wrote: Rough cut - Program: OpenStack Data Processing Mission: To provide the OpenStack community with an open, cutting edge, performant and scalable data processing stack and associated management interfaces. Proposing a slightly different mission: To provide a simple, reliable and repeatable mechanism by which to deploy Hadoop and related Big Data projects, including management, monitoring and processing mechanisms driving further adoption of OpenStack. +1. I liked the data processing aspect as well, since EDP api directly relates to that, maybe a combination of both. On 09/10/2013 09:26 AM, Sergey Lukjanov wrote: It sounds too broad IMO. Looks like we need to define Mission Statement first. Sincerely yours, Sergey Lukjanov Savanna Technical Lead Mirantis Inc. On Sep 10, 2013, at 17:09, Alexander Kuznetsov akuznet...@mirantis.com mailto:akuznet...@mirantis.com mailto:akuznetsov@mirantis.__com mailto:akuznet...@mirantis.com wrote: My suggestion OpenStack Data Processing. On Tue, Sep 10, 2013 at 4:15 PM, Sergey Lukjanov slukja...@mirantis.com mailto:slukja...@mirantis.com mailto:slukja...@mirantis.com mailto:slukja...@mirantis.com__ wrote: Hi folks, due to the Incubator Application we should prepare Program name and Mission statement for Savanna, so, I want to start mailing thread about it. Please, provide any ideas here. P.S. List of existing programs: https://wiki.openstack.org/__wiki/Programs https://wiki.openstack.org/wiki/Programs P.P.S.
Re: [openstack-dev] [Tuskar] Tuskar Names Clarification Unification
From: Jaromir Coufal jcou...@redhat.com To: openstack-dev@lists.openstack.org, Date: 09/16/2013 11:51 AM Subject: Re: [openstack-dev] [Tuskar] Tuskar Names Clarification Unification Hi, after few days of gathering information, it looks that no more new ideas appear there, so let's take the last round of voting for names which you prefer. It's important for us to get on the same page. I am concerned that the proposals around the term 'rack' do not recognize that there might be more than one layer in the organization. Is it more important to get appropriately abstract and generic terms, or is the desire to match common concrete terms? Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse
I have written a brief document, with pictures. See https://docs.google.com/document/d/1hQQGHId-z1A5LOipnBXFhsU3VAMQdSe-UXvL4VPY4ps Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [heat] [scheduler] Bringing things together for Icehouse
I've read up on recent goings-on in the scheduler subgroup, and have some thoughts to contribute. But first I must admit that I am still a newbie to OpenStack, and still am missing some important clues. One thing that mystifies me is this: I see essentially the same thing, which I have generally taken to calling holistic scheduling, discussed in two mostly separate contexts: (1) the (nova) scheduler context, and (2) the ambitions for heat. What am I missing? I have read the Unified Resource Placement Module document (at https://docs.google.com/document/d/1cR3Fw9QPDVnqp4pMSusMwqNuB_6t-t_neFqgXA98-Ls/edit?pli=1# ) and NovaSchedulerPerspective document (at https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?pli=1#heading=h.6ixj0ctv4rwu ). My group already has running code along these lines, and thoughts for future improvements, so I'll mention some salient characteristics. I have read the etherpad at https://etherpad.openstack.org/IceHouse-Nova-Scheduler-Sessions - and I hope my remarks will help fit these topics together. Our current code uses one long-lived process to make placement decisions. The information it needs to do this job is pro-actively maintained in its memory. We are planning to try replacing this one process with a set of equivalent processes, not sure how well it will work out (we are a research group). We make a distinction between desired state, target state, and observed state. The desired state comes in through REST requests, each giving a full virtual resource topology (VRT). A VRT includes constraints that affect placement, but does not include actual placement decisions. Those are made by what we call the placement agent. Yes, it is separate from orchestration (even in the first architecture figure in the u-rpm document the orchestration is separate --- the enclosing box does not abate the essential separateness). In our architecture, orchestration is downstream from placement (as in u-rpm). The placement agent produces target state, which is essentially desired state augmented by placement decisions. Observed state is what comes from the lower layers (Software Defined Compute, Storage, and Network). We mainly use OpenStack APIs for the lower layers, and have added a few local extensions to make the whole story work. The placement agent judges available capacity by subtracting current allocations from raw capacity. The placement agent maintains in its memory a derived thing we call effective state; the allocations in effective state are the union of the allocations in target state and the allocations in observed state. Since the orchestration is downstream, some of the planned allocations are not in observed state yet. Since other actors can use the underlying cloud, and other weird sh*t happens, not all the allocations are in target state. That's why placement is done against the union of the allocations. This is somewhat conservative, but the alternatives are worse. Note that placement is concerned with allocations rather than current usage. Current usage fluctuates much faster than you would want placement to. Placement needs to be done with a long-term perspective. Of course, that perspective can be informed by usage information (as well as other sources) --- but it remains a distinct thing. We consider all our copies of observed state to be soft --- they can be lost and reconstructed at any time, because the true source is the underlying cloud. Which is not to say that reconstructing a copy is cheap. We prefer making incremental updates as needed, rather than re-reading the whole thing. One of our local extensions adds a mechanism by which a client can register to be notified of changes in the Software Defined Compute area. The target state, on the other hand, is stored authoritatively by the placement agent in a database. We pose placement as a constrained optimization problem, with a non-linear objective. We approximate its solution with a very generic algorithm; it is easy to add new kinds of constraints and new contributions to the objective. The core placement problem is about packing virtual resources into physical containers (e.g., VMs into hosts, volumes into Cinder backends). A virtual resource has a demand vector, and a corresponding container has a capacity vector of the same length. For a given container, the sum of the demand vectors of the virtual resources in that container can not exceed the container's capacity vector in any dimension. We can add dimensions as needed to handle the relevant host/guest characteristics. We are just now working an example where a Cinder volume can be required to be the only one hosted on whatever Cinder backend hosts it. This is exactly analogous to requiring that a VM (bare metal or otherwise) be the only one hosted by whatever PM hosts it. We favor a fairly expressive language for stating desired
Re: [openstack-dev] [heat] [scheduler] (How to talk about) Bringing things together for Icehouse
As I mentioned the last time this was brought up, I already have a meeting series that conflicts with the scheduler group chats and will be hard to move; that is why I have been trying to participate asynchronously. But since Gary asked again, I am seeing what I can do about that other meeting series. Unless and until something gives, I will have to continue participating asynchronously. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse
From: Gary Kotton gkot...@vmware.com ... Can you please join us at the up and coming scheduler meeting. That will give you a chance to bring up the idea's and discuss them with a larger audience. I will do so on Sep 17. Later meetings still TBD. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [savanna] Host information for non admin users - from a holistic scheduler
Alex, my understanding is that the motivation for rack-awareness in Hadoop is optimizing availability rather than networking. The good news, for those of us who favor a holistic scheduler, is that it can take both sorts of things into account when/where desired. Yes, the case of a public cloud is more difficult than the case of a private cloud. My understanding of Amazon's attitude, for example, is that they do not want to give out any bits of information about placement --- even though there are known techniques to reverse-engineer it, Amazon does not want to help that along at all. Giving out obscured information --- some bits but not all --- is still disfavored. Let me give a little background on how my group deals with placement for availability, then discuss options for the public cloud. Our holistic scheduler takes as input something we call a virtual resource topology (VRT), other people use words like pattern, template, application, and cluster for such a thing. It is an assembly of virtual resources that one tenant wants to instantiate. In a VRT the resources are arranged into a tree of groups, the VRT itself is the root. We use the groups for concise statements of various sorts, which I will omit here for the sake of simplicity. As far as direct location constraints are concern, there is just one primitive thing: it is a relationship between two virtual resources and it is parameterized by a sense (positive or negative) and a level in the physical hierarchy (e.g., physical machine (PM), chassis, rack). Thus: a negative relationship between VM1 and VM2 at the rack level means that VM1 and VM2 must go on different racks; a positive relationship between VM3 and VM4 at the PM level means those two VMs must be on the same host. Additionally, each constraint can be hard or soft: a hard constraint must be satisfied while a soft constraint is a preference. Consider the example of six interchangeable VMs (say VM1, VM2, ... VM6) that should be spread across at least two racks with no more than half the VMs on any one rack. How to say that with a collection of location primitives? What we do is establish three rack-level anti-co-location constraints: one between VM1 and VM2, one between VM3 and VM4, and one between VM5 and VM6. That is not the most obvious representation. You might have expected this: nine rack-level anti-co-location constraints, one for every pair in the outer product between {VM1, VM2, VM3} and {VM4, VM5, VM6}. Now consider what happens if the physical system has three racks and room for only two additional VMs on each rack. With the latter set of constraints, there is no acceptable placement. With the sparser set that we use, there are allowed placements. In short, an obvious set of constraints may rule out otherwise acceptable placement. I see two ways to give guaranteed-accurate rack awareness to Hadoop: constrain the placement so tightly that you know enough to configure Hadoop before the placement decision is made, or extract placement information after the placement decision is made. The public cloud setting rules out the latter, leaving only the former. This can be done, at a cost of suffering pattern rejections that would not occur if you did not have to over-constrain the placement. One more option is to give up on guaranteed accuracy: prescribe a placement with sufficient precision to inform Hadoop, and so inform Hadoop, but make that prescription a preference rather than a hard constraint. If the actual placement does not fully meet all the preferences, Hadoop is not informed of the differences and so will suffer in non-functional ways but still get the job done (modulo all those non-functional considerations, like tolerating a rack crash). When your preferences are not met, it is because the system is very loaded and your only choice is between operating in some degraded way or not at all --- you might as well take the degraded operation. Regards, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] Comments/questions on the instance-group-api-extension blueprint
We are currently explicitly considering location and space. For example, a template can require that a volume be in a disk that is directly attached to the machine hosting the VM to which the volume is attached. Spinning rust bandwidth is much trickier because it is not something you can simply add up when you combine workloads. The IOPS, as well as the B/S, that a disk will deliver depends on the workload mix on that disk. While the disk may deliver X IOPS when serving only application A, and Y when serving only application B, you cannot conclude that it will serve (X+Y)/2 when serving (A+B)/2. While we hope to do better in the future, we currently handle disk bandwidth in non-quantitative ways. One is that a template may request that a volume be placed such that it does not compete with any other volume (i.e., is the only one on its disk). Another is that a template may specify a type for a volume, which effectively maps to a Cinder volume type that has been pre-defined to correspond to a QoS defined in an enterprise storage subsystem. The choice between fastexpensive vs slowcheap storage is currently left to higher layers. That could be pushed down, supposing there is a suitably abstract yet accurate way of describing how the tradeoff choice should be made. I think Savanna people are on this list too, so I presume it's a good place for this discussion. Thanks, Mike From: shalz sh...@hotmail.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 09/11/2013 09:55 PM Subject:Re: [openstack-dev] [heat] Comments/questions on the instance-group-api-extension blueprint Mike, You mention We are now extending that example to include storage, and we are also working examples with Hadoop. In the context of your examples / scenarios, do these placement decisions consider storage performance and capacity on a physical node? For example: Based on application needs, and IOPS, latency requirements - carving out a SSD storage or a traditional spinning disk block volume? Or say for cost-efficiency reasons using SSD caching on Hadoop name nodes? I'm investigating a) Per node PCIe SSD deployment need in Openstack environment / Hadoop environment and ,b) selected node SSD caching, specifically for OpenStack Cinder. Hope this is the right forum to ask this question. rgds, S ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] Comments/questions on the instance-group-api-extension blueprint
Gary Kotton gkot...@vmware.com wrote on 09/12/2013 05:40:59 AM: From: Gary Kotton gkot...@vmware.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 09/12/2013 05:46 AM Subject: Re: [openstack-dev] [heat] Comments/questions on the instance-group-api-extension blueprint Hi, For some reason I am unable to access your proceed talk. I am not 100% sure but I think that the voting may be closed. We have weekly scheduling meetings (https://wiki.openstack.org/wiki/ Meetings#Scheduler_Sub-group_meeting). It would be nice if you could attend and it will give you a platform to raise and share ideas with the rest of the guys in the community. At the moment the scheduling subgroup is working on our ideas for the design summit sessions. Please see https:// etherpad.openstack.org/IceHouse-Nova-Scheduler-Sessions Thanks Gary Worse yet, I know of no way to navigate to a list of design summit proposals. What am I missing? The scheduler group meeting conflicts with another meeting that I already have and will be difficult to move. I will see what I can do asynchronously. Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [savanna] Host information for non admin users
From: Nirmal Ranganathan rnir...@gmail.com ... Not host capacity, just a opaque reference to distinguish a host is enough. Hadoop can use that information to appropriately place block replicas. For example if the replication count is 3, and if a host/ rack topology is provided to Hadoop, it will place each replica on a different host/rack granted one is available. What if there are more than three racks, but some are better choices than others (perhaps even some are ruled out) due to considerations of various sorts of capacity and usage? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [savanna] Program name and Mission statement
To provide a simple, reliable and repeatable mechanism by which to deploy Hadoop and related Big Data projects, including management, monitoring and processing mechanisms driving further adoption of OpenStack. That sounds like it is at about the right level of specificity. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [savanna] Program name and Mission statement
A quick dictionary lookup of data processing yields the following. I wonder if you mean something more specific. data processing |ˈˌdædə ˈprɑsɛsɪŋ| noun a series of operations on data, esp. by a computer, to retrieve, transform, or classify information. From: Matthew Farrellee m...@redhat.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 09/10/2013 09:53 AM Subject:Re: [openstack-dev] [savanna] Program name and Mission statement Rough cut - Program: OpenStack Data Processing Mission: To provide the OpenStack community with an open, cutting edge, performant and scalable data processing stack and associated management interfaces. On 09/10/2013 09:26 AM, Sergey Lukjanov wrote: It sounds too broad IMO. Looks like we need to define Mission Statement first. Sincerely yours, Sergey Lukjanov Savanna Technical Lead Mirantis Inc. On Sep 10, 2013, at 17:09, Alexander Kuznetsov akuznet...@mirantis.com mailto:akuznet...@mirantis.com wrote: My suggestion OpenStack Data Processing. On Tue, Sep 10, 2013 at 4:15 PM, Sergey Lukjanov slukja...@mirantis.com mailto:slukja...@mirantis.com wrote: Hi folks, due to the Incubator Application we should prepare Program name and Mission statement for Savanna, so, I want to start mailing thread about it. Please, provide any ideas here. P.S. List of existing programs: https://wiki.openstack.org/wiki/Programs P.P.S. https://wiki.openstack.org/wiki/Governance/NewPrograms Sincerely yours, Sergey Lukjanov Savanna Technical Lead Mirantis Inc. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org mailto:OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [heat] Comments/questions on the instance-group-api-extension blueprint
First, I'm a newbie here, wondering: is this the right place for comments/questions on blueprints? Supposing it is... I am referring to https://blueprints.launchpad.net/nova/+spec/instance-group-api-extension In my own research group we have experience with a few systems that do something like that, and more (as, indeed, that blueprint explicitly states that it is only the start of a longer roadmap). I would like to highlight a couple of differences that alarm me. One is the general overlap between groups. I am not saying this is wrong, but as a matter of natural conservatism we have shied away from unnecessary complexities. The only overlap we have done so far is hierarchical nesting. As the instance-group-api-extension explicitly contemplates groups of groups as a later development, this would cover the overlap that we have needed. On the other hand, we already have multiple policies attached to a single group. We have policies for a variety of concerns, so some can combine completely or somewhat independently. We also have relationships (of various sorts) between groups (as well as between individuals, and between individuals and groups). The policies and relationships, in general, are not simply names but also have parameters. Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [savanna] Program name and Mission statement
Jon Maron jma...@hortonworks.com wrote on 09/10/2013 08:50:23 PM: From: Jon Maron jma...@hortonworks.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Cc: OpenStack Development Mailing List openstack-dev@lists.openstack.org Date: 09/10/2013 08:55 PM Subject: Re: [openstack-dev] [savanna] Program name and Mission statement Openstack Big Data Platform Let's see if you mean that. Does this project aim to cover big data things besides MapReduce? Can you give examples of other things that are in scope? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Scaling of TripleO
Robert Collins robe...@robertcollins.net wrote on 09/06/2013 05:31:14 PM: From: Robert Collins robe...@robertcollins.net To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 09/06/2013 05:36 PM Subject: Re: [openstack-dev] [tripleo] Scaling of TripleO ... My vision for TripleO/undercloud and scale in the long term is: - A fully redundant self-healing undercloud - (implies self hosting) ... Robert, what do you mean by self hosting? If a cloud can self-host, why do we need two clouds (under and over)? Thanks, Mike___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Frustrations with review wait times
Joshua, I do not think such a strict and coarse scheduling is a practical way to manage developers, who have highly individualized talents, backgrounds, and interests. Regards, Mike ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Stats on blueprint design info / creation times
For the case of an item that has no significant doc of its own but is related to an extensive blueprint, how about linking to that extensive blueprint? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev