Re: [openstack-dev] Change in openstack/heat[master]: Implement a Heat-native resource group

2013-10-17 Thread Mike Spreitzer
What is the rationale for this new feature?  Since there is already an 
autoscaling group implemented by Heat, what is the added benefit here? And 
why is it being done as another heat-native thing rather than as an 
independent service (e.g., as outlined in 
https://wiki.openstack.org/wiki/Heat/AutoScaling for an autoscaling group 
service)?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Change in openstack/heat[master]: Implement a Heat-native resource group

2013-10-17 Thread Mike Spreitzer
Clint Byrum cl...@fewbar.com wrote on 10/17/2013 09:16:12 PM:

 Excerpts from Mike Spreitzer's message of 2013-10-17 17:19:58 -0700:
  What is the rationale for this new feature?  Since there is already an 

  autoscaling group implemented by Heat, what is the added benefit here? 
And 
  why is it being done as another heat-native thing rather than as an 
  independent service (e.g., as outlined in 
  https://wiki.openstack.org/wiki/Heat/AutoScaling for an autoscaling 
group 
  service)?
 
 This supports that design quite well.
 
 The point is to be able to group and clone any resource, not just
 server/instance. So autoscaling might be configured to manage a group
 of Trove database instances which are then fed as a list to a group of
 separately autoscaled webservers.

Thanks for the answer.  I'm just a newbie here, trying to understand 
what's going on.  I still don't quite follow.  
https://wiki.openstack.org/wiki/Heat/AutoScaling says that what's 
autoscaled is a set of resources, not just one.  Can there be dependencies 
among the resources in that set?  For example, is the intent that I could 
autoscale a pair of (DB server, web server) where the web server's 
properties depend on the DB server's attributes?  If so, would it be 
problematic to implement that in terms of a pair of Heat-native resource 
groups?

BTW, is there some place I could have read the answers to my questions 
about the design thinking here?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] HOT Software configuration proposal

2013-10-16 Thread Mike Spreitzer
Steven Hardy sha...@redhat.com wrote on 10/16/2013 04:11:40 AM:
 ...
 IMO we should be abstracting the software configuration complexity 
behind a
 Heat resource interface, not pushing it up to a pre-processor (which
 implies some horribly complex interfaces at the heat template level)

I am not sure I follow.  Can you please elaborate on the horrible 
implication?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] HOT Software configuration proposal

2013-10-16 Thread Mike Spreitzer
Zane Bitter zbit...@redhat.com wrote on 10/16/2013 10:30:44 AM:

 On 16/10/13 15:58, Mike Spreitzer wrote:
 ...
  Thanks for a great short sharp answer.  In that light, I see a 
concern.
Once a workflow has been generated, the system has lost the ability 
to
  adapt to changes in either model.  In a highly concurrent and dynamic
  environment, that could be problematic.
 
 I think you're referring to the fact if reality diverges from the model 
 we have no way to bring it back in line (and even when doing an update, 
 things can and usually will go wrong if Heat's idea of the existing 
 template does not reflect reality any more). If so, then I agree that we 

 are weak in this area. You're obviously aware of 
 http://summit.openstack.org/cfp/details/95 so it is definitely on the 
radar.

Actually, I am thinking of both of the two models you mentioned.  We are 
only in the midst of implementing an even newer design (heat based), but 
for my group's old code we have a revised design in which the 
infrastructure orchestrator can react to being overtaken by later updates 
to the model we call target state (origin source is client) as well as 
concurrent updates to the model we call observed state (origin source is 
hardware/hypervisor).  I haven't yet decided what to recommend to the heat 
community, so I'm just mentioning the issue as a possible concern.

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Scheduler meeting and Icehouse Summit

2013-10-16 Thread Mike Spreitzer
Mike Wilson geekinu...@gmail.com wrote on 10/16/2013 07:13:17 PM:

 I need to understand better what holistic scheduling means, ...

By holistic I simply mean making a joint decision all at once about a 
bunch of related resources of a variety of types.  For example, making a 
joint decision about where to place a set of VMs and the Cinder volumes 
that will be attached to the VMs.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] HOT Software configuration proposal

2013-10-15 Thread Mike Spreitzer
Steve Baker sba...@redhat.com wrote on 10/15/2013 06:48:53 PM:

 From: Steve Baker sba...@redhat.com
 To: openstack-dev@lists.openstack.org, 
 Date: 10/15/2013 06:51 PM
 Subject: [openstack-dev] [Heat] HOT Software configuration proposal
 
 I've just written some proposals to address Heat's HOT software 
 configuration needs, and I'd like to use this thread to get some 
feedback:
 https://wiki.openstack.org/wiki/Heat/Blueprints/hot-software-config

In that proposal, each component can use a different configuration 
management tool.

 
https://wiki.openstack.org/wiki/Heat/Blueprints/native-tools-bootstrap-config


In this proposal, I get the idea that it is intended that each Compute 
instance run only one configuration management tool.  At least, most of 
the text discusses the support (e.g., the idea that each CM tool supplies 
userdata to bootstrap itself) in terms appropriate for a single CM tool 
per instance; also, there is no discussion of combining userdata from 
several CM tools.

I agree with the separation of concerns issues that have been raised.  I 
think all this software config stuff can be handled by a pre-processor 
that takes an extended template in and outputs a plain template that can 
be consumed by today's heat engine (no extension to the heat engine 
necessary).

Regards,
Mike
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] HOT Software configuration proposal

2013-10-15 Thread Mike Spreitzer
The threading in the archive includes this discussion under the HOT 
Software orchestration proposal for workflows heading, and the overall 
ordering in the archive looks very mixed up to me.  I am going to reply 
here, hoping that the new subject line will be subject to less strange 
ordering in the archive; this is really a continuation of the overall 
discussion, not just Steve Baker's proposal.

What is the difference between what today's heat engine does and a 
workflow?  I am interested to hear what you experts think, I hope it will 
be clarifying.  I presume the answers will touch on things like error 
handling, state tracking, and updates.

I see the essence of Steve Baker's proposal to be that of doing the 
minimal mods necessary to enable the heat engine to orchestrate software 
components.  The observation is that not much has to change, since the 
heat engine is already in the business of calling out to things and 
passing values around.  I see a little bit of a difference, maybe because 
I am too new to already know why it is not an issue.  In today's heat 
engine, the calls are made to fixed services to do CRUD operations on 
virtual resources in the cloud, using credentials managed implicitly; the 
services have fixed endpoints, even as the virtual resources come and go. 
Software components have no fixed service endpoints; the service endpoints 
come and go as the host Compute instances come and go; I did not notice a 
story about authorization for the software component calls.

Interestingly, Steve Baker's proposal reminds me a lot of Chef.  If you 
just rename Steve's component to recipe, the alignment gets real 
obvious; I am sure it is no accident.  I am not saying it is isomorphic 
--- clearly Steve Baker's proposal has more going on, with its cross-VM 
data dependencies and synchronization.  But let me emphasize that we can 
start to see a different way of thinking here.  Rather than focusing on a 
centrally-run workflow, think of each VM as independently running its own 
series of recipes --- with the recipes invocations now able to communicate 
and synchronize between VMs as well as within VMs.

Steve Baker's proposal uses two forms of communication and synchronization 
between VMs: (1) get_attr and (2) wait conditions and handles (sugar 
coated or not).  The implementation of (1) is part of the way the heat 
engine invokes components, the implementation of (2) is independent of the 
heat engine.

Using the heat engine for orchestration is limited to the kinds of logic 
that the heat engine can run.  This may be one reason people are 
suggesting using a general workflow engine.  However, the recipes 
(components) running in the VMs can do general computation; if we allow 
general cross-VM communication and synchronization as part of those 
general computations, we clearly have a more expressive system than the 
heat engine.

Of course, a general distributed computation can get itself into trouble 
(e.g., deadlock, livelock).  If we structure that computation as a set of 
components (recipe invocations) with a DAG of dependencies then we avoid 
those troubles.  And the kind of orchestration that the heat engine does 
is sufficient to invoke such components.

Structuring software orchestration as a DAG of components also gives us a 
leg up on UPDATE.  Rather than asking the user to write a workflow for 
each different update, or a general meta-workflow that does introspection 
to decide what work needs to be done, we ask the thing that invokes the 
components to run through the components in the way that today's heat 
engine runs through resources for an UPDATE.

Lakshmi has been working on a software orchestration technique that is 
also centered on the idea of a DAG of components.  It was created before 
we got real interested in Heat.  It is implemented as a pre-processor that 
runs upstream of where today's heat engine goes, emitting fairly minimal 
userdata needed for bootstrapping.  The dependencies between recipe 
invocations are handled very smoothly in the recipes, which are written in 
Chef.  No hackery is needed in the recipe text at all (thanks to Ruby 
metaprogramming); what is needed is only an additional declaration of what 
are the cross-VM inputs and outputs of each recipe.  The propagation of 
data and synchronization between VMs is handled, under the covers, via 
simple usage of ZooKeeper (other implementations are reasonable too).  But 
the idea of heat-independent propagation of data and synchronization among 
a DAG of components is not limited to chef-based components, and can 
appear fairly smooth in any recipe language.

A value of making software orchestration independent of today's heat 
engine is that it enables the four-stage pipeline that I have sketched at 
https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U 
and whose ordering of functionality has been experimentally vetted with 
some non-trivial examples.  The first big one 

Re: [openstack-dev] [scheduler] Policy Model

2013-10-14 Thread Mike Spreitzer
Consider the example at 
https://docs.google.com/drawings/d/1nridrUUwNaDrHQoGwSJ_KXYC7ik09wUuV3vXw1MyvlY

We could indeed have distinct policy objects.  But I think they are policy 
*uses*, not policy *definitions* --- which is why is prefer to give them 
less prominent lifecycles.  In the example cited above, one policy use 
object might be: {id: some int, type: anti_collocation, properties: 
{level: rack}}, and there are four references to it; another policy use 
object might be {id: some int, type: network_reachability}, and there 
are three references to it.  What object should own the policy use 
objects?  You might answer that policy uses are owned by groups.  I do not 
think it makes sense to give them a more prominent lifecycle.  As I said, 
my preference would be to give them a less prominent lifecycle.  I would 
be happy to see each policy use owned by an InstanceGroupPolicy[Use] that 
references it and allow only one reference per policy use --- in other 
words, make the InstanceGroupPolicy[Use] class inherit from the Policy Use 
class.  And since I am not proposing that anything else inherit from the 
Policy Use class, I would even more prefer to see its contents simply 
merged inline into the InstanceGroupPolicy[Use] class.

Regards,
Mike



From:   Yathiraj Udupi (yudupi) yud...@cisco.com
To: Mike Spreitzer/Watson/IBM@IBMUS, 
Cc: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org
Date:   10/14/2013 01:38 PM
Subject:Re: [scheduler] Policy Model



Mike, 

Like I proposed in my previous email about the model and the APIs, 

About the InstanceGroupPolicy, why not leave it as is, and introduce a new 
abstract model class called Policy. 
The InstanceGroupPolicy will be a reference to a Policy object saved 
separately. 
and the policy field will point to the saved Policy object's unique name 
or id. 

The new class Policy – can have the usual fields – id, name, uuid, and a 
dictionary of key-value pairs for any additional arguments about the 
policy. 

This is in alignment with the model for InstanceGroupMember, which is a 
reference to an actual Instance Object saved in the DB. 

I will color all the diamonds black to make it a composition I the UML 
diagram. 

Thanks,
Yathi. 







From: Mike Spreitzer mspre...@us.ibm.com
Date: Monday, October 14, 2013 7:14 AM
To: Yathiraj Udupi yud...@cisco.com
Cc: OpenStack Development Mailing List openstack-dev@lists.openstack.org
Subject: [scheduler] Policy Model

Could we agree on the following small changes to the model you posted last 
week?

1.  Rename InstanceGroupPolicy to InstanceGroupPolicyUse

2.  In InstanceGroupPolicy[Use], rename the policy field to 
policy_type

3.  Add an InstanceGroupPolicyUseProperty table, holding key/value pairs 
(two strings) giving the properties of the policy uses

4.  Color all the diamonds black 

Thanks, 
Mike

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft

2013-10-14 Thread Mike Spreitzer
That came through beautifully formatted to me, but it looks much worse in 
the archive.  I'm going to use crude email tech here, so that I know it 
won't lose anything in handling.

Yathiraj Udupi (yudupi) yud...@cisco.com wrote on 10/14/2013 01:17:47 
PM:

 I read your email where you expressed concerns regarding create-time
 dependencies, and I agree they are valid concerns to be addressed. 
 But like we all agree, as a starting point, we are just focusing on 
 the APIs for now, and will leave that aside as implementation 
 details to be addressed later. 

I am not sure I understand your language here.  To me, design decisions 
that affect what calls the clients make are not implementation details, 
they are part of the API design.

 Thanks for sharing your suggestions on how we can simplify the APIs.
 I think we are getting closer to finalizing this one. 
 
 Let us start at the model proposed here - 
 [1] https://docs.google.com/document/d/
 17OIiBoIavih-1y4zzK0oXyI66529f-7JTCVj-BcXURA/edit?usp=sharing 
 (Ignore the white diamonds - they will be black, when I edit the doc)
 
 The InstanceGroup represents all the information necessary to 
 capture the group - nodes, edges, policies, and metadata
 
 InstanceGroupMember - is a reference to an Instance, which is saved 
 separately, using the existing Instance Model in Nova.

I think you mean this is a reference to either a group or an individual 
Compute instance.

 
 InstanceGroupMemberConnection - represents the edge
 
 InstanceGroupPolicy is a reference to a Policy, which will also be 
 saved separately, (currently not existing in the model, but has to 
 be created). Here in the Policy model, I don't mind adding any 
 number of additional fields, and key-value pairs to be able to fully
 define a policy.  I guess a Policy-metadata dictionary is sufficient
 to capture all the required arguments. 
 The InstanceGroupPolicy will be associated to a group as a whole or an 
edge.

Like I said under separate cover, I think one of these is a policy *use* 
rather than a policy *definition*.  I go further and emphasize that the 
interesting out-of-scope definitions are of policy *types*.  A policy type 
takes parameters.  For example, policies of the anti-collocation (AKA 
anti-affinity) type have a parameter that specifies the level in the 
physical hierarchy where the location must differ (rack, host, ...).  Each 
policy type specifies a set of parameters, just like a procedure specifies 
parameters; each use of a policy type supplies values for the parameters, 
just like a procedure invocation supplies values for the procedure's 
parameters.  I suggest separating parameter values from metadata; the 
former are described by the policy type, while the latter are unknown to 
the policy type and are there for other needs of the client.

Yes, a use of a policy type is associated with a group or an edge.  In my 
own writing I have suggested a third possibility: that a policy use can be 
directly associated with an individual resource.  It just so happens that 
the code my group already has been running also has your restriction: it 
supports only policies associated with groups and relationships.  But I 
suggested allowing direct attachment to resources (as well as 
relationships also being able to directly reference resources instead of 
groups) because I think this restriction --- while it simplifies 
implementation --- makes templates more verbose; I felt the latter was a 
more important consideration than the former.  If you want to roadmap this 
--- restricted first, liberal later --- that's fine with me.

 
 InstanceGroupMetadata - represents key-value dictionary for any 
 additional metadata for the instance group. 
 
 I think this should fully support what we care about - nodes, edges,
 policies and metadata. 
 
 Do we all agree ? 

Yes, with exceptions noted above.

 
 Now going to the APIs, 
 
 Register GROUP API (from my doc [1]): 
 
 POST  /v3.0/{tenant_id}/groups --- Register a group

In such specs it would be good to be explicit about the request parameters 
and body.  If I follow correctly, 
https://review.openstack.org/#/c/30028/25/doc/api_samples/os-instance-groups/instance-groups-post-req.json
 
shows us that you intended (as of that patch) the body to carry a group 
definition.

 I think the confusion is only about when the member (all nested 
 members) and policy about when they are saved in the DB (registered,
 but not CREATED actually), such that we can associate a UUID.  This 
 led to my original thinking that it is a 3-phase operation where we 
 have to register (save in DB) the nested members first, then 
 register the group as a whole.  But this is not client friendly. 
 
 Like I had suggested earlier, as an implementation detail of the 
 Group registration API (CREATE part 1 in your terminology), we can 
 support this: as part of the group registration transaction, 
 complete the registration of the nested members, get their UUIDs, 
 create 

Re: [openstack-dev] Scheduler meeting and Icehouse Summit

2013-10-14 Thread Mike Spreitzer
Yes, Rethinking Scheduler Design  
http://summit.openstack.org/cfp/details/34 is not the same as the 
performance issue that Boris raised.  I think the former would be a 
natural consequence of moving to an optimization-based joint 
decision-making framework, because such a thing necessarily takes a good 
enough attitude.  The issue Boris raised is more efficient tracking of 
the true state of resources, and I am interested in that issue too.  A 
holistic scheduler needs such tracking, in addition to the needs of the 
individual services.  Having multiple consumers makes the issue more 
interesting :-)

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [scheduler] Policy Model

2013-10-14 Thread Mike Spreitzer
Yathiraj Udupi (yudupi) yud...@cisco.com wrote on 10/14/2013 11:43:34 
PM:

 ... 
 
 For the policy model, you can expect rows in the DB each 
 representing different policy instances something like- 

  {id: , uuid: SOME-UUID-1, name: anti-colocation-1,  type: 
 anti-colocation, properties: {level: rack}}

  {id: , uuid: SOME-UUID-2, name: anti-colocation-2,  type: 
 anti-colocation, properties: {level: PM}}

  {id: , uuid: SOME-UUID-3, name: network-reachabilty-1, 
 type: network-reachability properties: {}}
 
 And for the InstanceGroupPolicy model, you can expect rows such as 

 {id: 5, policy: SOME-UUID-1, type: group, edge_id: , 
 group_id: 12345}

 {id: 6, policy: SOME-UUID-1, type: group, edge_id: , 
 group_id: 22334} 

Do you imagine just one policy object of a given contents, or many?  Put 
another way, would every InstanceGroupPolicy object that wants to apply a 
rack-level anti-collocation policy use SOME-UUID-1?

Who or what created the record with id ?  Who or what decides to 
delete it, and when and why?  What about dangling references?  It seems to 
me that needing to answer these questions simply imposes unnecessary 
burdens.  If the type and properties fields of record id  were 
merged inline (replacing the policy:SOME-UUID-1 field) into records id 
, , and the other uses, then there are no hard questions to 
answer; the group author knows what policies he wants to apply and where, 
and he simply writes them there.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft

2013-10-11 Thread Mike Spreitzer
Regarding Alex's question of which component does holistic infrastructure 
scheduling, I hesitate to simply answer heat.  Heat is about 
orchestration, and infrastructure scheduling is another matter.  I have 
attempted to draw pictures to sort this out, see 
https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U 
and 
https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g 
.  In those you will see that I identify holistic infrastructure 
scheduling as separate functionality from infrastructure orchestration 
(the main job of today's heat engine) and also separate from software 
orchestration concerns.  However, I also see a close relationship between 
holistic infrastructure scheduling and heat, as should be evident in those 
pictures too.

Alex made a remark about the needed inputs, and I agree but would like to 
expand a little on the topic.  One thing any scheduler needs is knowledge 
of the amount, structure, and capacity of the hosting thingies (I wish I 
could say resources, but that would be confusing) onto which the 
workload is to be scheduled.  Scheduling decisions are made against 
available capacity.  I think the most practical way to determine available 
capacity is to separately track raw capacity and current (plus already 
planned!) allocations from that capacity, finally subtracting the latter 
from the former.

In Nova, for example, sensing raw capacity is handled by the various 
nova-compute agents reporting that information.  I think a holistic 
infrastructure scheduler should get that information from the various 
individual services (Nova, Cinder, etc) that it is concerned with 
(presumably they have it anyway).

A holistic infrastructure scheduler can keep track of the allocations it 
has planned (regardless of whether they have been executed yet).  However, 
there may also be allocations that did not originate in the holistic 
infrastructure scheduler.  The individual underlying services should be 
able to report (to the holistic infrastructure scheduler, even if lowly 
users are not so authorized) all the allocations currently in effect.  An 
accurate union of the current and planned allocations is what we want to 
subtract from raw capacity to get available capacity.

If there is a long delay between planning and executing an allocation, 
there can be nasty surprises from competitors --- if there are any 
competitors.  Actually, there can be nasty surprises anyway.  Any 
scheduler should be prepared for nasty surprises, and react by some 
sensible retrying.  If nasty surprises are rare, we are pretty much done. 
If nasty surprises due to the presence of competing managers are common, 
we may be able to combat the problem by changing the long delay to a short 
one --- by moving the allocation execution earlier into a stage that is 
only about locking in allocations, leaving all the other work involved in 
creating virtual resources to later (perhaps Climate will be good for 
this).  If the delay between planning and executing an allocation is short 
and there are many nasty surprises due to competing managers, then you 
have too much competition between managers --- don't do that.

Debo wants a simpler nova-centric story.  OK, how about the following. 
This is for the first step in the roadmap, where scheduling decisions are 
still made independently for each VM instance.  For the client/service 
interface, I think we can do this with a simple clean two-phase interface 
when traditional software orchestration is in play, a one-phase interface 
when slick new software orchestration is used.  Let me outline the 
two-phase flow.  We extend the Nova API with CRUD operations on VRTs 
(top-level groups).  For example, the CREATE operation takes a definition 
of a top-level group and all its nested groups, definitions (excepting 
stuff like userdata) of all the resources (only VM instances, for now) 
contained in those groups, all the relationships among those 
groups/resources, and all the applications of policy to those groups, 
resources, and relationships.  This is a rest-style interface; the CREATE 
operation takes a definition of the thing (a top-level group and all that 
it contains) being created; the UPDATE operation takes a revised 
definition of the whole thing.  Nova records the presented information; 
the familiar stuff is stored essentially as it is today (but marked as 
being in some new sort of tentative state), and the grouping, 
relationship, and policy stuff is stored according to a model like the one 
DeboYathi wrote.  The CREATE operation returns a UUID for the newly 
created top-level group.  The invocation of the top-level group CRUD is a 
single operation and it is the first of the two phases.  In the second 
phase of a CREATE flow, the client creates individual resources with the 
same calls as are used today, except that each VM instance create call is 
augmented with a pointer into the policy information.  That 

Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft

2013-10-11 Thread Mike Spreitzer
I'll be at the summit too.  Available Nov 4 if we want to do some prep 
then.  It will be my first summit, I am not sure how overbooked my summit 
time will be.

Regards,
Mike



From:   Sylvain Bauza sylvain.ba...@bull.net
To: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org, 
Cc: Mike Spreitzer/Watson/IBM@IBMUS
Date:   10/11/2013 08:19 AM
Subject:Re: [openstack-dev] [scheduler] APIs for Smart Resource 
Placement - Updated Instance Group Model and API extension model - WIP 
Draft



Long-story short, sounds like we do have the same concerns here in 
Climate.

I'll be present at the Summit, any chance to do an unconference meeting in

between all parties ?

Thanks,
-Sylvain

Le 11/10/2013 08:25, Mike Spreitzer a écrit :
Regarding Alex's question of which component does holistic infrastructure 
scheduling, I hesitate to simply answer heat.  Heat is about 
orchestration, and infrastructure scheduling is another matter.  I have 
attempted to draw pictures to sort this out, see 
https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U 
and 
https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH
_TONw6g 
.  In those you will see that I identify holistic infrastructure 
scheduling as separate functionality from infrastructure orchestration 
(the main job of today's heat engine) and also separate from software 
orchestration concerns.  However, I also see a close relationship between 
holistic infrastructure scheduling and heat, as should be evident in those

pictures too. 

Alex made a remark about the needed inputs, and I agree but would like to 
expand a little on the topic.  One thing any scheduler needs is knowledge 
of the amount, structure, and capacity of the hosting thingies (I wish I 
could say resources, but that would be confusing) onto which the 
workload is to be scheduled.  Scheduling decisions are made against 
available capacity.  I think the most practical way to determine available

capacity is to separately track raw capacity and current (plus already 
planned!) allocations from that capacity, finally subtracting the latter 
from the former. 

In Nova, for example, sensing raw capacity is handled by the various 
nova-compute agents reporting that information.  I think a holistic 
infrastructure scheduler should get that information from the various 
individual services (Nova, Cinder, etc) that it is concerned with 
(presumably they have it anyway). 

A holistic infrastructure scheduler can keep track of the allocations it 
has planned (regardless of whether they have been executed yet).  However,

there may also be allocations that did not originate in the holistic 
infrastructure scheduler.  The individual underlying services should be 
able to report (to the holistic infrastructure scheduler, even if lowly 
users are not so authorized) all the allocations currently in effect.  An 
accurate union of the current and planned allocations is what we want to 
subtract from raw capacity to get available capacity. 

If there is a long delay between planning and executing an allocation, 
there can be nasty surprises from competitors --- if there are any 
competitors.  Actually, there can be nasty surprises anyway.  Any 
scheduler should be prepared for nasty surprises, and react by some 
sensible retrying.  If nasty surprises are rare, we are pretty much done. 
If nasty surprises due to the presence of competing managers are common, 
we may be able to combat the problem by changing the long delay to a short

one --- by moving the allocation execution earlier into a stage that is 
only about locking in allocations, leaving all the other work involved in 
creating virtual resources to later (perhaps Climate will be good for 
this).  If the delay between planning and executing an allocation is short

and there are many nasty surprises due to competing managers, then you 
have too much competition between managers --- don't do that. 

Debo wants a simpler nova-centric story.  OK, how about the following. 
This is for the first step in the roadmap, where scheduling decisions are 
still made independently for each VM instance.  For the client/service 
interface, I think we can do this with a simple clean two-phase interface 
when traditional software orchestration is in play, a one-phase interface 
when slick new software orchestration is used.  Let me outline the 
two-phase flow.  We extend the Nova API with CRUD operations on VRTs 
(top-level groups).  For example, the CREATE operation takes a definition 
of a top-level group and all its nested groups, definitions (excepting 
stuff like userdata) of all the resources (only VM instances, for now) 
contained in those groups, all the relationships among those 
groups/resources, and all the applications of policy to those groups, 
resources, and relationships.  This is a rest-style interface; the CREATE 
operation takes a definition of the thing (a top-level group and all

Re: [openstack-dev] [Heat] HOT Software orchestration proposal for workflows

2013-10-09 Thread Mike Spreitzer
I favor separation of concerns.  I think (4), at least, has got nothing to 
do with infrastructure orchestration, the primary concern of today's heat 
engine.  I advocate (4), but as separate functionality.

Regards,
Mike

Alex Rudenko alexei.rude...@gmail.com wrote on 10/09/2013 12:59:22 PM:

 From: Alex Rudenko alexei.rude...@gmail.com
 To: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org, 
 Date: 10/09/2013 01:03 PM
 Subject: Re: [openstack-dev] [Heat] HOT Software orchestration 
 proposal for workflows
 
 Hi everyone,
 
 I've read this thread and I'd like to share some thoughts. In my 
 opinion, workflows (which run on VMs) can be integrated with heat 
 templates as follows:
 1. workflow definitions should be defined separately and processed 
 by stand-alone workflow engines (chef, puppet etc). 
 2. the HOT resources should reference workflows which they require, 
 specifying a type of workflow and the way to access a workflow 
 definition. The workflow definition might be provided along with HOT.
 3. Heat should treat the orchestration templates as transactions 
 (i.e. Heat should be able to rollback in two cases: 1) if something 
 goes wrong during processing of an orchestration workflow 2) when a 
 stand-alone workflow engine reports an error during processing of a 
 workflow associated with a resource)
 4. Heat should expose an API which enables basic communication 
 between running workflows. Additionally, Heat should provide an API 
 to workflows that allows workflows to specify whether they completed
 successfully or not. The reference to these APIs should be passed to
 the workflow engine that is responsible for executing workflows on VMs.
 Pros of each point:
 1  2 - keeps Heat simple and gives a possibility to choose the best
 workflows and engines among available ones.
 3 - adds some kind of all-or-nothing semantics improving the control
 and awareness of what's going on inside VMs.
 4 - allows workflow synchronization and communication through Heat 
 API. Provides the error reporting mechanism for workflows. If a 
 workflow does not need this functionality, it can ignore it.
 
 Cons:
 - Changes to existing workflows making them aware of Heat existence 
 are required.
 
 These thoughts might show some gaps in my understanding of how Heat 
 works, but I would like to share them anyway.
 
 Best regards,
 Oleksii Rudenko
 
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft

2013-10-09 Thread Mike Spreitzer
Yes, there is more than the northbound API to discuss.  Gary started us 
there in the Scheduler chat on Oct 1, when he broke the issues down like 
this:

11:12:22 AM garyk: 1. a user facing API
11:12:41 AM garyk: 2. understanding which resources need to be tracked
11:12:48 AM garyk: 3. backend implementation

The full transcript is at 
http://eavesdrop.openstack.org/meetings/scheduling/2013/scheduling.2013-10-01-15.08.log.html

Alex Glikson glik...@il.ibm.com wrote on 10/09/2013 02:14:03 AM:
 
 Good summary. I would also add that in A1 the schedulers (e.g., in 
 Nova and Cinder) could talk to each other to coordinate. Besides 
 defining the policy, and the user-facing APIs, I think we should 
 also outline those cross-component APIs (need to think whether they 
 have to be user-visible, or can be admin). 
 
 Regards, 
 Alex 
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft

2013-10-09 Thread Mike Spreitzer
Debojyoti Dutta ddu...@gmail.com wrote on 10/09/2013 02:48:26 AM:

 Mike, I agree we could have a cleaner API but I am not sure how
 cleanly it will integrate with current nova which IMO should be test
 we should pass (assuming we do cross services later)

I think the cleaner APIs integrate with Nova as well as the three phase 
API you suggested.  Am I missing some obvious impediment?

 ...
  To me the most frustrating aspect of this challenge is the need for 
the
  client to directly mediate the dependencies between resources; this is
  really what is driving us to do ugly things.  As I mentioned before, I 
am
  coming from a setting that does not have this problem.  So I am 
thinking
  about two alternatives: (A1) how clean can we make a system in which 
the
  client continues to directly mediate dependencies between resources, 
and
  (A2) how easily and cleanly can we make that problem go away.
 
 Am a little confused - How is the API dictating either A1 or A2? Isnt
 that a function of the implementation of the API.  For a moment let us
 assume that the black box implementation will be awesome and address
 your concerns.

I am talking about how the client/service interface, it is not (just) a 
matter of service implementation.

My complaint is that the software orchestration technique commonly used 
prevents us from having a one-phase API for holistic infrastructure 
scheduling.  The commonly used software orchestration technique requires 
some serialization of the resource creation calls.  For example, if one VM 
instance runs a database and another VM instance runs a web server that 
needs to be configured with the private IP address of the database, the 
common technique is for the client to first create the database VM 
instance, then take the private IP address from that VM instance and use 
it to compose the userdata that is passed in the Nova call that creates 
the web server VM instance.  That client can not present all at once a 
fully concrete and literal specification of both VM instances, because the 
userdata for one is not knowable until the other has been created.  The 
client has to be able to make create-like calls in some particular order 
rather than ask for all creation at once.  If the client could ask for all 
creation at once then we could use a one-phase API: it simply takes a 
specification of the resources along with their policies and 
relationships.

Of course, there is another way out.  We do have in OpenStack a technology 
by which a client can present all at once a specification of many VM 
instances where the userdata of some depend on the results of creating 
others.  If we were willing to use this technology, we could follow A2. 
The CREATE flow would go like this: (i) the client presents the 
specification of resources (including the computations that link some), 
with grouping, relationships, and policies, to our new API; (ii) our new 
service registers the new topology and (once we advance this far on the 
development roadmap) does holistic scheduling; (iii) our new service 
updates the resource specifications to include pointers into the policy 
data; (iv) our new service passes the enhanced resource specifications to 
that other service that can do the creation calls linked by the prescribed 
computations; (v) that other service does its thing, causing a series 
(maybe with some allowed parallelism) of creation calls, each augmented by 
the relevant pointer into the policy information; (vi) the service 
implementing a creation call gets what it normally does plus the policy 
pointer, which it follows to get the relevant policy information (at the 
first step in the development roadmap) or the scheduling decision (in the 
second step of the development roadmap).  But I am getting ahead of myself 
here and discussing backend implementation; I think we are still working 
on the user-facing API.

 The question is this - does the current API help
 specify what we  want assuming we will be able to extend the notion of
 nodes, edges, policies and metadata?

I am not sure I understand that remark.  Of course the API you proposed is 
about enabling the client to express the policy information that we both 
advocate.  I am not sure I understand why you add the qualifier of 
assuming we will be able to extend the notion of   I do not think we 
(yet) have a policy type catalog set in stone, if that is the concern.  I 
think there is an interesting discussion to have about defining that 
catalog.

BTW, note that the class you called InstanceGroupPolicy is not just a 
reference to a policy, it also specifies one place where that policy is 
being applied.  That is really the class of policy applications (or 
uses).

I think some types of policies have parameters.  A relationship policy 
about limiting the number of network hops takes a parameter that is the 
hop count limit.  A policy about anti-collocation takes a physical 
hierarchy level as a parameter, to put a lower 

Re: [openstack-dev] [Climate] Questions and comments

2013-10-08 Thread Mike Spreitzer
Yes, that helps.  Please, guys, do not interpret my questions as 
hostility, I really am just trying to understand.  I think there is some 
overlap between your concerns and mine, and I hope we can work together.

Sticking to the physical reservations for the moment, let me ask for a 
little more explicit details.  In your outline below, late in the game you 
write the actual reservation is performed by the lease manager plugin. 
Is that the point in time when something (the lease manager plugin, in 
fact) decides which hosts will be used to satisfy the reservation?  Or is 
that decided up-front when the reservation is made?  I do not understand 
how the lease manager plugin can make this decision on its own, isn't the 
nova scheduler also deciding how to use hosts?  Why isn't there a problem 
due to two independent allocators making allocations of the same resources 
(the system's hosts)?

Thanks,
Mike

Patrick Petit patrick.pe...@bull.net wrote on 10/07/2013 07:02:36 AM:

 Hi Mike,
 
 There are actually more facets to this. Sorry if it's a little 
 confusing :-( Climate's original blueprint https://
 wiki.openstack.org/wiki/Blueprint-nova-planned-resource-reservation-api
 was about physical host reservation only. The typical use case 
 being: I want to reserve x number of hosts that match the 
 capabilities expressed in the reservation request. The lease is 
 populated with reservations which at this point are only capacity 
 descriptors. The reservation becomes active only when the lease 
 starts at a specified time and for a specified duration. The lease 
 manager plugin in charge of the physical reservation has a planning 
 of reservations that allows Climate to grant a lease only if the 
 requested capacity is available at that time. Once the lease becomes
 active, the user can request instances to be created on the reserved
 hosts using a lease handle as a Nova's scheduler hint. That's 
 basically it. We do not assume or enforce how and by whom (Nova, 
 Heat ,...) a resource instantiation is performed. In other words, a 
 host reservation is like a whole host allocation https://
 wiki.openstack.org/wiki/WholeHostAllocation that is reserved ahead 
 of time by a tenant in anticipation of some workloads that is bound 
 to happen in the future. Note that while we are primarily targeting 
 hosts reservations the same service should be offered for storage. 
 Now, Mirantis brought in a slew of new use cases that are targeted 
 toward virtual resource reservation as explained earlier by Dina. 
 While architecturally both reservation schemes (physical vs virtual)
 leverage common components, it is important to understand that they 
 behave differently. For example, Climate exposes an API for the 
 physical resource reservation that the virtual resource reservation 
 doesn't. That's because virtual resources are supposed to be already
 reserved (through some yet to be created Nova, Heat, Cinder,... 
 extensions) when the lease is created. Things work differently for 
 the physical resource reservation in that the actual reservation is 
 performed by the lease manager plugin not before the lease is 
 created but when the lease becomes active (or some time before 
 depending on the provisioning lead time) and released when the lease 
ends.
 HTH clarifying things.
 BR,
 Patrick 
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Climate] Questions and comments

2013-10-08 Thread Mike Spreitzer
Sylvain: please do not interpret my questions as hostility.  I am only 
trying to understand your proposal, but I am still confused.  Can you 
please walk through a scenario involving Climate reservations on virtual 
resources?  I mean from start to finish, outlining which party makes which 
decision when, based on what.  I am trying to understand the relationship 
between the individual resource schedulers (such as nova, cinder) and 
climate --- they both seem to be about allocating the same resources.

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft

2013-10-08 Thread Mike Spreitzer
Thanks for the clue about where the request/response bodies are 
documented.  Is there any convenient way to view built documentation for 
Havana right now?

You speak repeatedly of the desire for clean interfaces, and nobody 
could disagree with such words.  I characterize my desire that way too. It 
might help me if you elaborate a little on what clean means to you.  To 
me it is about minimizing the number of interactions between different 
modules/agents and the amount of information in those interactions.  In 
short, it is about making narrow interfaces - a form of simplicity.

To me the most frustrating aspect of this challenge is the need for the 
client to directly mediate the dependencies between resources; this is 
really what is driving us to do ugly things.  As I mentioned before, I am 
coming from a setting that does not have this problem.  So I am thinking 
about two alternatives: (A1) how clean can we make a system in which the 
client continues to directly mediate dependencies between resources, and 
(A2) how easily and cleanly can we make that problem go away.

For A1, we need the client to make a distinct activation call for each 
resource.  You have said that we should start the roadmap without joint 
scheduling; in this case, the scheduling can continue to be done 
independently for each resource and can be bundled with the activation 
call.  That can be the call we know and love today, the one that creates a 
resource, except that it needs to be augmented to also carry some pointer 
that points into the policy data so that the relevant policy data can be 
taken into account when making the scheduling decision.  Ergo, the client 
needs to know this pointer value for each resource.  The simplest approach 
would be to let that pointer be the combination of (p1) a VRT's UUID and 
(p2) the local name for the resource within the VRT.  Other alternatives 
are possible, but require more bookkeeping by the client.

I think that at the first step of the roadmap for A1, the client/service 
interaction for CREATE can be in just two phases.  In the first phase the 
client presents a topology (top-level InstanceGroup in your terminology), 
including resource definitions, to the new API for registration; the 
response is a UUID for that registered top-level group.  In the second 
phase the client creates the resources as is done today, except that 
each creation call is augmented to carry the aforementioned pointer into 
the policy information.  Each resource scheduler (just nova, at first) can 
use that pointer to access the relevant policy information and take it 
into account when scheduling.  The client/service interaction for UPDATE 
would be in the same two phases: first update the policyresource 
definitions at the new API, then do the individual resource updates in 
dependency order.

I suppose the second step in the roadmap is to have Nova do joint 
scheduling.  The client/service interaction pattern can stay the same. The 
only difference is that Nova makes the scheduling decisions in the first 
phase rather than the second.  But that is not a detail exposed to the 
clients.

Maybe the third step is to generalize beyond nova?

For A2, the first question is how to remove user-level create-time 
dependencies between resources.  We are only concerned with the 
user-level create-time dependencies here because it is only they that 
drive intimate client interactions.  There are also create-time 
dependencies due to the nature of the resource APIs; for example, you can 
not attach a volume to a VM until after both have been created.  But 
handling those kinds of create-time dependencies does not require intimate 
interactions with the client.  I know of two software orchestration 
technologies developed in IBM, and both have the property that there are 
no user-level create-time dependencies between resources; rather, the 
startup code (userdata) that each VM runs handles dependencies (using a 
library for cross-VM communication and synchronization).  This can even be 
done in plain CFN, using wait conditions and handles (albeit somewhat 
clunkily), right?  So I think there are ways to get this nice property 
already.  The next question is how best to exploit it to make cleaner 
APIs.  I think we can have a one-step client/service interaction: the 
client presents a top-level group (including leaf resource definitions) to 
the new service, which registers it and proceeds to 
create/schedule/activate the resources.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Climate] Questions and comments

2013-10-07 Thread Mike Spreitzer
Do not worry about what I want, right now I am just trying to understand 
the Climate proposal, wrt virtual resources (Patrick helped a lot on the 
physical side).  Can you please walk through a scenario involving Climate 
reservations on virtual resources?  I mean from start to finish, outlining

which party makes which decision, based on what.

Thanks,
Mike



From:   Sylvain Bauza sylvain.ba...@bull.net
To: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org, 
Cc: Mike Spreitzer/Watson/IBM@IBMUS
Date:   10/07/2013 05:07 AM
Subject:Re: [openstack-dev] [Climate] Questions and comments



Hi Mike,

Dina and you outlined some differences in terms of seeing what is 
dependent on what. 
As Dina explained, Climate plans to be integrated into Nova and Heat 
logics, where Heat and Nova would request Climate API by asking for a 
lease and would tag on their own the resources as 'RESERVED'.
On your point, and correct me if I'm wrong, you would rather see Climate 
on top of Heat and Nova, scheduling resources on its own, and only send 
creation requests to Heat and Nova. 

I'm happy to say both of you are right : Climate aims to be both called by

Nova and *also* calling Nova. That's just matter of what Climate *is*. And

here is the confusion.

That's why Climate is not only one API endpoint. It actually have two 
distinct endpoints : one called the Lease API endpoint, and one called the

Resource Reservation API endpoint.

As a Climate developer working on physical hosts reservations (and not 
Heat stacks), my concern is to be able to guarantee to a REST client 
(either a user or another service) that if this user wants to provision X 
hosts on a specific timeframe in the future (immediate or in 10 years), 
Climate will be able to provision them. By meaning being able and 
guarantee, I do use strong words for stating that we engage ourselves to

be able to plan what will be resources capacity state in the future.

This decision-making process (ie. this Climate scheduler) will be 
implemented as RPC Service for the Reservation API, and thus will needs to

keep its own persistence layer in Climate. Of course, it will request the 
Lease API for really creating the lease and managing lease start/end 
hooks, that's the Lease API job.


Provided you would want to use the Reservation API for reserving Heat 
stacks, you would have to implement it tho.


Thanks,
-Sylvain

Le 06/10/2013 20:41, Mike Spreitzer a écrit :
Thanks, Dina.  Yes, we do not understand each other; can I ask some more 
questions? 

You outlined a two-step reservation process (We assume the following 
reservation process for the OpenStack services...), and right after that 
talked about changing your mind to use Heat instead of individual 
services.  So I am confused, I am not sure which of your remarks reflect 
your current thinking and which reflect old thinking.  Can you just state 
your current thinking? 

On what basis would Climate decide to start or stop a lease?  What sort of

event notifications would Climate be sending, and when and why, and what 
would subscribers do upon receipt of such notifications? 

If the individual resource services continue to make independent 
scheduling decisions as they do today, what value does Climate add? 

Maybe a little more detailed outline of what happens in your current 
thinking, in support of an explicitly stated use case that shows the 
value, would help here. 

Thanks, 
Mike 

_
__
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft

2013-10-07 Thread Mike Spreitzer
Thanks.  I have a few questions.  First, I am a bit stymied by the style 
of API documentation used in that document and many others: it shows the 
first line of an HTTP request but says nothing about all the other 
details.  I am sure some of those requests must have interesting bodies, 
but I am not always sure which ones have a body at all, let alone what 
goes in it.  I suspect there may be some headers that are important too. 
Am I missing something?

That draft says the VMs are created before the group.  Is there a way 
today to create a VM without scheduling it?

As I understand your draft, it lays out a three phase process for a client 
to follow: create resources without scheduling or activating them, then 
arrange them into groups, then schedule  activate them.  By activate I 
mean, for a VM instance, to start running it.  That ordering must hold 
independently for each resource.  Activations are invoked by the client in 
an order that is consistent with (a) runtime dependencies that are 
mediated directly by the client (e.g., string slinging in the heat engine) 
and (b) the nature of the resources (for example, you  can not attach a 
volume to a VM instance until after both have been created).  Other than 
those considerations, the ordering and/or parallelism is a degree of 
freedom available to the client.  Have I got this right?

Couldn't we simplify this into a two phase process: create groups and 
resources with scheduling, then activate the resources in an acceptable 
order?

FYI: my group is using Weaver as the software orchestration technique, so 
there are no runtime dependencies that are mediated directly by the 
client.  The client sees a very simple API: the client presents a 
definition of all the groups and resources, and the service first 
schedules it all then activates in an acceptable order.  (We already have 
something in OpenStack that can do activations in an acceptable order, 
right?)  Weaver is not the only software orchestration technique with this 
property.  The simplicity of this API is one reason I recommend software 
orchestration techniques that take dependency mediation out of the 
client's hands.  I hope that with coming work on HOT we can get OpenStack 
to this level of API simplicity.  But that struggle lies farther down the 
roadmap...

Thanks,
Mike

Yathiraj Udupi (yudupi) yud...@cisco.com wrote on 10/07/2013 11:10:20 
PM:
 
 Hi, 
 
 Based on the discussions we have had in the past few scheduler sub-
 team meetings,  I am sharing a document that proposes an updated 
 Instance Group Model and API extension model. 
 This is a work-in-progress draft version, but sharing it for early 
feedback. 
 https://docs.google.com/document/d/
 17OIiBoIavih-1y4zzK0oXyI66529f-7JTCVj-BcXURA/edit?usp=sharing 
 
 This model support generic instance types, where an instance can 
 represent a virtual node of any resource type.  But in the context 
 of Nova, an instance refers to the VM instance. 
 
 This builds on the existing proposal for Instance Group Extension as
 documented here in this blueprint:  https://
 blueprints.launchpad.net/nova/+spec/instance-group-api-extension 
 
 Thanks,
 Yathi. 
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft

2013-10-07 Thread Mike Spreitzer
In addition to the other questions below, I was wondering if you could 
explain why you included all those integer IDs; aren't the UUIDs 
sufficient?

Thanks,
Mike



From:   Mike Spreitzer/Watson/IBM@IBMUS
To: Yathiraj Udupi (yudupi) yud...@cisco.com, 
Cc: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org
Date:   10/08/2013 12:41 AM
Subject:Re: [openstack-dev] [scheduler] APIs for Smart Resource 
Placement - Updated Instance Group Model and API extension model - WIP 
Draft



Thanks.  I have a few questions.  First, I am a bit stymied by the style 
of API documentation used in that document and many others: it shows the 
first line of an HTTP request but says nothing about all the other 
details.  I am sure some of those requests must have interesting bodies, 
but I am not always sure which ones have a body at all, let alone what 
goes in it.  I suspect there may be some headers that are important too. 
Am I missing something? 

That draft says the VMs are created before the group.  Is there a way 
today to create a VM without scheduling it? 

As I understand your draft, it lays out a three phase process for a client 
to follow: create resources without scheduling or activating them, then 
arrange them into groups, then schedule  activate them.  By activate I 
mean, for a VM instance, to start running it.  That ordering must hold 
independently for each resource.  Activations are invoked by the client in 
an order that is consistent with (a) runtime dependencies that are 
mediated directly by the client (e.g., string slinging in the heat engine) 
and (b) the nature of the resources (for example, you  can not attach a 
volume to a VM instance until after both have been created).  Other than 
those considerations, the ordering and/or parallelism is a degree of 
freedom available to the client.  Have I got this right? 

Couldn't we simplify this into a two phase process: create groups and 
resources with scheduling, then activate the resources in an acceptable 
order? 

FYI: my group is using Weaver as the software orchestration technique, so 
there are no runtime dependencies that are mediated directly by the 
client.  The client sees a very simple API: the client presents a 
definition of all the groups and resources, and the service first 
schedules it all then activates in an acceptable order.  (We already have 
something in OpenStack that can do activations in an acceptable order, 
right?)  Weaver is not the only software orchestration technique with this 
property.  The simplicity of this API is one reason I recommend software 
orchestration techniques that take dependency mediation out of the 
client's hands.  I hope that with coming work on HOT we can get OpenStack 
to this level of API simplicity.  But that struggle lies farther down the 
roadmap... 

Thanks, 
Mike 

Yathiraj Udupi (yudupi) yud...@cisco.com wrote on 10/07/2013 11:10:20 
PM: 
 
 Hi, 
 
 Based on the discussions we have had in the past few scheduler sub-
 team meetings,  I am sharing a document that proposes an updated 
 Instance Group Model and API extension model. 
 This is a work-in-progress draft version, but sharing it for early 
feedback. 
 https://docs.google.com/document/d/
 17OIiBoIavih-1y4zzK0oXyI66529f-7JTCVj-BcXURA/edit?usp=sharing 
 
 This model support generic instance types, where an instance can 
 represent a virtual node of any resource type.  But in the context 
 of Nova, an instance refers to the VM instance. 
 
 This builds on the existing proposal for Instance Group Extension as
 documented here in this blueprint:  https://
 blueprints.launchpad.net/nova/+spec/instance-group-api-extension 
 
 Thanks, 
 Yathi. ___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Climate] Questions and comments

2013-10-06 Thread Mike Spreitzer
Thanks, Dina.  Yes, we do not understand each other; can I ask some more 
questions?

You outlined a two-step reservation process (We assume the following 
reservation process for the OpenStack services...), and right after that 
talked about changing your mind to use Heat instead of individual 
services.  So I am confused, I am not sure which of your remarks reflect 
your current thinking and which reflect old thinking.  Can you just state 
your current thinking?

On what basis would Climate decide to start or stop a lease?  What sort of 
event notifications would Climate be sending, and when and why, and what 
would subscribers do upon receipt of such notifications?

If the individual resource services continue to make independent 
scheduling decisions as they do today, what value does Climate add?

Maybe a little more detailed outline of what happens in your current 
thinking, in support of an explicitly stated use case that shows the 
value, would help here.

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse (now featuring software orchestration)

2013-10-02 Thread Mike Spreitzer
FYI, I have refined my pictures at 
https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U 
and 
https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g 
to hopefully make it clearer that I agree with the sentiment that holistic 
infrastructure scheduling should not be part of heat but is closely 
related, and to make a graphical illustration of why I prefer the ordering 
of functionality that I do (the boundary between software and 
infrastructure issues gets less squiggly).

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [scheduler] blueprint for host/hypervisor location information

2013-10-01 Thread Mike Spreitzer
Maybe the answer is hiding in plain sight: host aggregates.  This is a 
concept we already have, and it allows identification of arbitrary 
groupings for arbitrary purposes.___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [scheduler] [heat] Policy specifics (for holistic infrastructure scheduling)

2013-10-01 Thread Mike Spreitzer
Clint Byrum cl...@fewbar.com wrote on 10/01/2013 02:38:53 AM:

 From: Clint Byrum cl...@fewbar.com
 To: openstack-dev openstack-dev@lists.openstack.org, 
 Date: 10/01/2013 02:40 AM
 Subject: Re: [openstack-dev] [scheduler] [heat] Policy specifics 
 (for holistic infrastructure scheduling)
 
 Mike, this has been really fun, but it is starting to feel like a
 rabbit hole.
 
 The case for having one feels legitimate. However, at this point, I 
think
 someone will need to actually build it, or the idea is just a pipe 
dream.

Yes, Clint, I and colleagues are interested in building it.  I think Debo 
and Yathi are too.  And we are finding intersections with other projects. 
I am still new here and learning lots, so outputs have not come as fast as 
I had initially hoped.  But I appreciate being able to have discussions 
and draw on the wisdom of the group.

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [scheduler] [heat] Policy specifics (for holistic infrastructure scheduling)

2013-09-30 Thread Mike Spreitzer
OK, let's take the holistic infrastructure scheduling out of Heat.  It 
really belongs at a lower level anyway.  Think of it as something you slap 
on top of Nova, Cinder, Neutron, etc. and everything that is going to use 
them goes first through the holistic scheduler, to give it a chance to 
make some joint decisions.  Zane has been worried about conflicting 
decisions being made, but if everything goes through the holistic 
infrastructure scheduling service then there does not need to be an issue 
with other parallel decision-making services (more on this below).  For a 
public cloud, think of this holistic infrastructure scheduling as part of 
the service that the cloud offers to the public; the public says what it 
wants, and the various levels of schedulers work on delivering it; the 
internals are not exposed to the public.  For example, a cloud user may 
say spread my cluster across at least two racks, not too unevenly; you 
do not want that public cloud customer to be in the business of knowing 
how many racks are in the cloud, knowing how much each one is currently 
being used, and picking which rack will contain which members of his 
cluster.  For a private cloud, the holistic infrastructure scheduler 
should have the same humility as the lower schedulers: offer enough 
visibility and control to the clients that they can make decisions if they 
want to (thus, nobody needs to go around the holistic infrastructure 
scheduler if they already know what they want).

You do not want to ask the holistic infrastructure scheduler to schedule 
resources one by one; you want to ask it to allocate a whole 
pattern/template/topology.  There is thus no need for infrastructure 
orchestration prior to holistic infrastructure scheduling.

Once the holistic infrastructure scheduler has done its job, there is a 
need for infrastructure orchestration.  What should we use for that?

OK, more on the business of conflicting decisions.  For the sake of 
scalability and modularity, the holistic infrastructure scheduler should 
delegate as much decision-making as it can to more specific services.  The 
job of the holistic infrastructure scheduler is to make joint decisions 
when there are strong interactions between services.  You can fudge this 
either way (have the holistic infrastructure scheduler make more or less 
decisions than ideal), but if you want the best then I think the principle 
I stated is what would guide.  So what if a delegated decision conflicts 
with a holistic decision?  Don't do that.  Divide the decision-making 
responsibilities into distinct domains, for example with the holistic 
scheduler making relatively big-picture decisions and individual resource 
services filling in the details.

That said, there can still be nasty surprises from lower layers.  Even if 
the design has carefully partitioned decision-making responsibilities, 
irregular things can still happen (e.g., authorized people can do 
something unexpected).  Even if nothing intentionally does anything 
irregular, there remains the possibility of bugs.  The holistic 
infrastructure scheduler should be prepared for nasty surprises, and 
getting information that is as authoritative as possible to begin with 
(promptness doesn't hurt either).

Then there is the question of the scalability of the holistic 
infrastructure scheduler.  One hard kernel of that is solving the 
optimization problem.  Nobody should expect the scheduler to find the 
truly optimal solution; this is an NP-hard problem.  However, there exist 
optimization algorithms that produce pretty good approximations in modest 
amounts of time.  Additionally: if the patterns are small relative to the 
size of the whole zone being scheduled then it should be possible to do 
concurrent decision-making with optimistic concurrency control (as Clint 
has mentioned).

You would not want one holistic infrastructure scheduler for a whole 
geographically distributed cloud.  You could use a hierarchical 
arrangement, with one top-level decision-maker dividing a pattern between 
availability zones (by which I mean the sort of large independent domains 
that are typically known by that term) and then a subsidiary scheduler for 
each availability zone.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [scheduler] blueprint for host/hypervisor location information

2013-09-30 Thread Mike Spreitzer
Alex Glikson glik...@il.ibm.com wrote on 09/29/2013 03:30:35 PM:

 Mike Spreitzer mspre...@us.ibm.com wrote on 29/09/2013 08:02:00 PM:
 
  Another reason to prefer host is that we have other resources to 
  locate besides compute. 
 
 Good point. Another approach (not necessarily contradicting) could 
 be to specify the location as a property of host aggregate rather 
 than individual hosts (and introduce similar notion in Cinder, and 
 maybe Neutron). This could be an evolution/generalization of the 
 existing 'availability zone' attribute, which would specify a more 
 fine-grained location path (e.g., 
 'az_A:rack_R1:chassis_C2:node_N3'). We briefly discussed this 
 approach at the previous summit (see 'simple implementation' under 
 https://etherpad.openstack.org/HavanaTopologyAwarePlacement) -- but 
 unfortunately I don't think we made much progress with the actual 
 implementation in Havana (would be good to fix this in Icehouse). 

Thanks for the background.  I can still see the etherpad, but the old 
summit proposal to which it points is gone.

The etherpad proposes an API, and leaves open the question of whether it 
backs onto a common service.  I think that is a key question.  In my own 
group's work, this sort of information is maintained in a shared database. 
 I'm not sure what is the right approach for OpenStack.

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [scheduler] blueprint for host/hypervisor location information

2013-09-29 Thread Mike Spreitzer
Robert Collins robe...@robertcollins.net wrote on 09/29/2013 02:21:28 
AM:

 Host not hypervisor I think; consider nova baremetal, where hypervisor
 == machine that runs tftpd and makes IPMI calls, and host == place
 where the user workload will execute.

In nova baremetal, is there still a hypervisor in the picture, and is it 
necessarily the same machine as the host?

Another reason to prefer host is that we have other resources to locate 
besides compute.

But the current API maps a host to a list of uniformly-shaped contents, it 
is not obvious to me what would be a good way to extend this.  Any ideas? 
Following is an example, it is the result of a GET on 
http://novahost:port/v2/tennantid/os-hosts/hostname


1.  {
2.  host:
3.  [
4.  {
5.  resource:
6.  {
7.  project: (total),
8.  memory_mb: 96661,
9.  host: x3630r7n8,
10. cpu: 32,
11. disk_gb: 2216
12. }
13. },
14. {
15. resource:
16. {
17. project: (used_now),
18. memory_mb: 70144,
19. host: x3630r7n8,
20. cpu: 34,
21. disk_gb: 880
22. }
23. },
24. {
25. resource:
26. {
27. project: (used_max),
28. memory_mb: 69632,
29. host: x3630r7n8,
30. cpu: 34,
31. disk_gb: 880
32. }
33. },
34. {
35. resource:
36. {
37. project: 5e5e2b0da114499b838c8d24c31bea08,
38. memory_mb: 69632,
39. host: x3630r7n8,
40. cpu: 34,
41. disk_gb: 880
42. }
43. }
44. ]
45. }___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [scheduler] blueprint for host/hypervisor location information

2013-09-29 Thread Mike Spreitzer
Monty Taylor mord...@inaugust.com wrote on 09/29/2013 01:38:26 PM:

 On 09/29/2013 01:02 PM, Mike Spreitzer wrote:
  Robert Collins robe...@robertcollins.net wrote on 09/29/2013 
02:21:28 AM:
  
  Host not hypervisor I think; consider nova baremetal, where 
hypervisor
  == machine that runs tftpd and makes IPMI calls, and host == place
  where the user workload will execute.
  
  In nova baremetal, is there still a hypervisor in the picture, and 
is
  it necessarily the same machine as the host?
 
 There is one or more machiens where nova-compute runs. Those machines
 are necessarily _not_ the same machine as the host.

So the host is the bare metal machine where the user's image is 
instantiated and run, and some other machine runs nova-compute to help set 
that up, right?

When I do a GET on http://novahost:port/v2/tenantid/servers/instanceid 
today, in a Grizzly installation running VM instances on KVM hypervisors, 
I get back a bunch of attributes, including hostId (whose value is a long 
hex string), OS-EXT-SRV-ATTR:host (whose value is a short name), and 
OS-EXT-SRV-ATTR:hypervisor_hostname (whose value is the same short name as 
OS-EXT-SRV-ATTR). Those short names are the same ones appearing in the 
reply to a GET of http://novahost:port/v2/tenantid/os-hosts and also in 
the reply to a GET of http://novahost:port/v2/tenantid/os-hypervisors.  In 
the case of baremetal, will a GET of 
http://novahost:port/v2/tenantid/os-hypervisors return things related to 
baremetal and, if so, which ones (the nova-compute machines or the hosts)?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [scheduler] blueprint for host/hypervisor location information

2013-09-28 Thread Mike Spreitzer
I have begun drafting a blueprint about more detailed host/hypervisor 
location information, to support the sort of policy-informed placement 
decision-making that Debo, Yathi, and I have been talking about.  The 
blueprint is at 
https://blueprints.launchpad.net/nova/+spec/hypervisor-location-attribute 
and the details are at 
https://wiki.openstack.org/wiki/Nova/HypervisorExtendedAttributes

You see I am a little schizophrenic here about scope.  The blueprint is 
named quite narrowly, and the details page is named more broadly; this is 
because I am not sure what else you folks will chime in with.

I am not sure whether this information should really be attached to a 
hypervisor or to a host.  I proposed hypervisor because currently the 
details for a hypervisor are a map (easily extended) whereas the details 
for a host are currently a list of uniformly-typed contents (not so easily 
extended).  But host might actually be more appropriate.  I am looking for 
feedback on whether/how to go that way.  BTW, where would I find 
documentation on host details?  The current page on nova extensions (
http://docs.openstack.org/api/openstack-compute/2/content/ext-compute.html
) is lacking most of them.

You will see that I have proposed what the API looks like, but not the 
source of this additional information.  I will ask my colleagues who have 
something like this locally, how they got it done and what they would 
recommend to OpenStack.  Perhaps you good folks have some suggestions.  Is 
there obviously one way to do it?  Is it obvious that there can be no one 
way and so a plug point is required?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [scheduler] [heat] Policy specifics

2013-09-27 Thread Mike Spreitzer
I have begun to draft some specifics about the sorts of policies that 
might be added to infrastructure to inform a smart unified placement 
engine.  These are cast as an extension to Heat templates.  See 
https://wiki.openstack.org/wiki/Heat/PolicyExtension .  Comments 
solicited.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [scheduler] [heat] Policy specifics

2013-09-27 Thread Mike Spreitzer
Stephen Gran stephen.g...@theguardian.com wrote on 09/27/2013 04:26:37 
AM:

 Maybe I'm missing something obvious, but I'm not convinced all that 
 logic belongs in Heat.  I would expect nova and related components to 
 expose grouping information (availability zones in nova, networks in 
 quantum, etc) and for end users to supply the group by information.

Yes, this additional policy information is not intended to inform 
infrastructure orchestration.  It is intended to inform something that I 
have been calling holistic infrastructure scheduling and others have 
called things like unified resource placement and smart resource 
placement.  I frame it as an extension to Heat templates because this 
policy information needs to be added to a statement about a whole 
pattern/template/topology and Heat templates are the language we have for 
such things.  The idea is that holistic infrastructure scheduling comes 
before infrastructure orchestration; by the time infrastructure 
orchestration happens, the policy information has been handled and removed 
(or, possibly, encapsulated in some way for downstream processing --- but 
that's another story I am not trying to broach yet).

I have been discussing this outline here under the subject Bringing 
things together for Icehouse (
http://lists.openstack.org/pipermail/openstack-dev/2013-September/015118.html
), in the scheduler subgroup and heat weekly IRC meetings, and have a 
design summit proposal (http://summit.openstack.org/cfp/details/113).


 I think that your use case for anti-collocation (which is a very good 
 and important use case, don't get me wrong) is covered by using 
 availability zones/cells/regions and so on as they are, and doesn't 
 require much logic internal to Heat beyond obeying the constraint 
 specified by a user.

If there are five racks in the system and I want to say that two VMs 
should be placed on different racks, how do I say that with AZs without 
being overly specific?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse (now featuring software orchestration)

2013-09-27 Thread Mike Spreitzer
Zane Bitter zbit...@redhat.com wrote on 09/27/2013 08:24:49 AM:

 Your diagrams clearly show scheduling happening in a separate stage to 
 (infrastructure) orchestration, which is to say that at the point where 
 resources are scheduled, their actual creation is in the *future*.
 
 I am not a Climate expert, but it seems to me that they have a 
 near-identical problem to solve: how do they integrate with Heat such 
 that somebody who has reserved resources in the past can actually create 

 them (a) as part of a Heat stack or (b) as standalone resources, at the 
 user's option. IMO OpenStack should solve this problem only once.

If I understand correctly, what Climate adds to the party is planning 
allocations to happen at some specific time in the non-immediate future. A 
holistic infrastructure scheduler is planning allocations to happen just 
as soon as we can get the plans through the relevant code path, which is 
why I describe it as now.


 If I understood your remarks correctly, we agree that there is no 
 (known) reason that the scheduling has to occur in the middle of 
 orchestration (which would have implied that it needed to be 
 incorporated in some sense into Heat).

If you agree that by orchestration you meant specifically infrastructure 
orchestration then we are agreed.  If software orchestration is also in 
the picture then I also agree that holistic infrastructure scheduling does 
not *have to* go in between software orchestration and infrastructure 
orchestration --- but I think that's a pretty good place for it.


 Right, so what I'm saying is that if all those things are _stated_ in 
 the input then there's no need to run the orchestration engine to find 
 out what they'll be; they're already stated.

Yep.

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse (now featuring software orchestration)

2013-09-27 Thread Mike Spreitzer
Sorry, I was a bit too hasty in writing the last part of my last message; 
I forgot to qualify software orchestration to indicate I am speaking 
only of its preparatory phase.  I should have written:

Zane Bitter zbit...@redhat.com wrote on 09/27/2013 08:24:49 AM:

...
 If I understood your remarks correctly, we agree that there is no 
 (known) reason that the scheduling has to occur in the middle of 
 orchestration (which would have implied that it needed to be 
 incorporated in some sense into Heat). 

If you agree that by orchestration you meant specifically infrastructure 
orchestration then we are agreed.  If software orchestration preparation 
is also in the picture then I also agree that holistic infrastructure 
scheduling does not *have to* go in between software orchestration 
preparation and infrastructure orchestration --- but I think that's a 
pretty good place for it.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [scheduler] [heat] Policy specifics

2013-09-27 Thread Mike Spreitzer
Stephen Gran stephen.g...@theguardian.com wrote on 09/27/2013 10:46:09 
AM:

 If the admins of the openstack install wanted users to be able to select 

 placement by rack, surely the availability zones would be rack1 - rack5 
 ?  In this case, the user would write:
 
Resources : {
  MyASG : {
Type : AWS::AutoScaling::AutoScalingGroup,
Properties : {
  AvailabilityZones : { Fn::GetAZs : },
  MinSize : 2,
DesiredSize: 2,
  MaxSize : 2,
}
  },
 
 This should naturally implement placement as spread evenly across AZs.

You have added that DesiredSize property, to convey the idea of 
spreading across at least some number (2 in this case) of AZs, right? That 
is, it is not functionality in today's Nova, rather something we could 
add.

What if the cloud in question has several levels of structure available, 
and the admins want users to be able to spread at any of the available 
levels?

 I think maybe this is where I think my disagreement is.  Heat should be 
 able to express a user preference for placement, but only within the 
 bounds of the policy already created by the admins of the nova install. 
   To have Heat have more knowledge than what is available via the nova 
 API seems overcomplicated and fragile to me.  If the nova API should 
 grow some extensions to make more complicated placement algorithms 
 available, then that's an argument that might have legs.

I am trying to find a way to introduce holistic infrastructure scheduling, 
whose purpose in life is to do what Nova, Cinder, etc can not do on their 
own.  (Making Nova, Cinder, and friends more capable on their own is also 
good, and complements what I am advocating.)  Yes, this requires some more 
visibility and control from those individual services.  Those needs have 
come up in vague ways in the discussion so far, and I plan to write 
something specific.

I am distinctly NOT advocating mixing holistic infrastructure scheduling 
up with infrastructure orchestration.  I think holistic infrastructure 
scheduling happens prior to infrastructure orchestration.  The more 
debatable question is the ordering with respect to the preparatory stage 
of software orchestration.  Another complicating factor is what happens 
when the infrastructure orchestration calls out to something that makes a 
nested stack.

Regards,
Mike

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [scheduler] [heat] Policy specifics

2013-09-27 Thread Mike Spreitzer
Clint Byrum cl...@fewbar.com wrote on 09/27/2013 11:58:16 AM:

 From: Clint Byrum cl...@fewbar.com
 To: openstack-dev openstack-dev@lists.openstack.org, 
 Date: 09/27/2013 12:01 PM
 Subject: Re: [openstack-dev] [scheduler] [heat] Policy specifics
 
...
  Mike,
  These are not the kinds of specifics that are of any help at all in 
  figuring out how (or, indeed, whether) to incorporate holistic 
  scheduling into OpenStack.
 
 I agree that the things in that page are a wet dream of logical 
deployment
 fun. However, I think one can target just a few of the basic ones,
 and see a real achievable case forming. I think I grasp Mike's ideas,
 so I'll respond to your concerns with what I think. Note that it is
 highly likely I've gotten some of this wrong.

It remains to be seen whether those things can be anything more than a wet 
dream for OpenStack, but they are running code elsewhere, so I have hope. 
What I wrote is pretty much a dump of what we have.  The exception is the 
network bandwidth stuff, which our holistic infrastructure scheduler 
currently ignores because we do not have a way to get the relevant 
capacity information from the physical infrastructure.  Part of the agenda 
here is to nudge Neutron to improve in that way.

  - What would a holistic scheduling service look like? A standalone 
  service? Part of heat-engine?
 
 I see it as a preprocessor of sorts for the current infrastructure 
engine.
 It would take the logical expression of the cluster and either turn
 it into actual deployment instructions or respond to the user that it
 cannot succeed. Ideally it would just extend the same Heat API.

My own expectation is that it would be its own service, preceding 
infrastructure orchestration in the flow.  Alternatively, we could bundle 
holistic infrastructure scheduling, infrastructure orchestration, and 
software orchestration preparation together under one API but still 
maintained as fairly separate modules of functionality.  Or various in 
between ideas.  I do not yet have a strong reason for one choice over 
another.  I have been looking to gain cluefulness from discussion with you 
folks.

  - How will the scheduling service reserve slots for resources in 
advance 
  of them being created? How will those reservations be accounted for 
and 
  billed?
  - In the event that slots are reserved but those reservations are not 
  taken up, what will happen?
 
 I dont' see the word reserve in Mike's proposal, and I don't think 
this
 is necessary for the more basic models like Collocation and 
Anti-Collocation.
 
 Reservations would of course make the scheduling decisions more likely 
to
 succeed, but it isn't necessary if we do things optimistically. If the
 stack create or update fails, we can retry with better parameters.

The raw truth of the matter is that even Nova has this problem already. 
The real ground truth of resource usage is in the hypervisor, not Nova. 
When Nova makes a decision, it really is provisional until confirmed by 
the hypervisor.  I have heard of cases, in different cloud software, where 
the thing making the placement decisions does not have a truly accurate 
picture of the resource usage.  These are typically caused by corner cases 
in failure scenarios, where the decision maker thinks something did not 
happen or was successfully deleted but in reality there is a zombie left 
over consuming some resources in the hypervisor.  There are probably cases 
where this can happen in OpenStack too, I am guessing.  Also, OpenStack 
does not prevent someone from going around Nova and directly asking a 
hypervisor to do something.

  - Once scheduled, how will resources be created in their proper slots 
as 
  part of a Heat template?
 
 In goes a Heat template (sorry for not using HOT.. still learning it. ;)
 
 Resources:
   ServerTemplate:
 Type: Some::Defined::ProviderType
   HAThing1:
 Type: OS::Heat::HACluster
 Properties:
   ClusterSize: 3
   MaxPerAZ: 1
   PlacementStrategy: anti-collocation
   Resources: [ ServerTemplate ]
 
 And if we have at least 2 AZ's available, it feeds to the heat engine:
 
 Resources:
   HAThing1-0:
 Type: Some::Defined::ProviderType
   Parameters:
 availability-zone: zone-A
   HAThing1-1:
 Type: Some::Defined::ProviderType
   Parameters:
 availability-zone: zone-B
   HAThing1-2:
 Type: Some::Defined::ProviderType
   Parameters:
 availability-zone: zone-A
 
 If not, holistic scheduler says back I don't have enough AZ's to
 satisfy MaxPerAZ.

Actually, I was thinking something even simpler (in the simple cases :-). 
By simple cases I mean where the holistic infrastructure scheduler makes 
all the placement decisions.  In that case, it only needs to get Nova to 
implement the decisions already made.  So the API call or template 
fragment for a VM instance would include an AZ parameter that specifies 
the particular host already chosen for that VM instance.  Similarly for 

Re: [openstack-dev] [scheduler] [heat] Policy specifics

2013-09-27 Thread Mike Spreitzer
Zane also raised an important point about value.  Any scheduler is serving 
one master most directly, the cloud provider.  Any sane cloud provider has 
some interest in serving the interests of the cloud users, as well as 
having some concerns of its own.  The way my group has resolved this is in 
the translation from the incoming requests to the underlying optimization 
problem that is solved for placement; in that translation we fold in the 
cloud provider's interests as well as the cloud user's.  We currently have 
a fixed opinion of the cloud provider's interests; generalizing that is a 
possible direction for future progress.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Generalising racks :- modelling a datacentre

2013-09-25 Thread Mike Spreitzer
I agree that such a thing is useful for scheduling.  I see a bit of a 
tension here: for software engineering reasons we want some independence, 
but we also want to avoid wasteful duplication.

I think we are collectively backing into the problem of metamodeling for 
datacenters, and establishing one or more software thingies that will 
contain/communicate datacenter models.  A collection of nodes annotated 
with tags is a metamodel.  You could define a graph-based metamodel 
without mandating any particular graph shape.  You could be more 
prescriptive and mandate a tree shape as a good compromise between 
flexibility and making something that is reasonably easy to process.  We 
can debate what the metamodel should be, but that is different from 
debating whether there is a metamodel.

Regards,
Mike



From:   Tomas Sedovic tsedo...@redhat.com
To: openstack-dev@lists.openstack.org, 
Date:   09/25/2013 10:37 AM
Subject:Re: [openstack-dev] [TripleO] Generalising racks :- 
modelling a datacentre



On 09/25/2013 05:15 AM, Robert Collins wrote:
 One of the major things Tuskar does is model a datacenter - which is
 very useful for error correlation, capacity planning and scheduling.

 Long term I'd like this to be held somewhere where it is accessible
 for schedulers and ceilometer etc. E.g. network topology + switch
 information might be held by neutron where schedulers can rely on it
 being available, or possibly held by a unified topology db with
 scheduler glued into that, but updated by neutron / nova / cinder.
 Obviously this is a) non-trivial and b) not designed yet.

 However, the design of Tuskar today needs to accomodate a few things:
   - multiple reference architectures for clouds (unless there really is
 one true design)
   - the fact that today we don't have such an integrated vertical 
scheduler.

 So the current Tuskar model has three constructs that tie together to
 model the DC:
   - nodes
   - resource classes (grouping different types of nodes into service
 offerings - e.g. nodes that offer swift, or those that offer nova).
   - 'racks'

 AIUI the initial concept of Rack was to map to a physical rack, but
 this rapidly got shifted to be 'Logical Rack' rather than physical
 rack, but I think of Rack as really just a special case of a general
 modelling problem..

Yeah. Eventually, we settled on Logical Rack meaning a set of nodes on 
the same L2 network (in a setup where you would group nodes into 
isolated L2 segments). Which kind of suggests we come up with a better 
name.

I agree there's a lot more useful stuff to model than just racks (or 
just L2 node groups).


From a deployment perspective, if you have two disconnected
 infrastructures, thats two AZ's, and two underclouds : so we know that
 any one undercloud is fully connected (possibly multiple subnets, but
 one infrastructure). When would we want to subdivide that?

 One case is quick fault aggregation: if a physical rack loses power,
 rather than having 16 NOC folk independently investigating the same 16
 down hypervisors, one would prefer to identify that the power to the
 rack has failed (for non-HA powered racks); likewise if a single
 switch fails (for non-HA network topologies) you want to identify that
 that switch is down rather than investigating all the cascaded errors
 independently.

 A second case is scheduling: you may want to put nova instances on the
 same switch as the cinder service delivering their block devices, when
 possible, or split VM's serving HA tasks apart. (We currently do this
 with host aggregates, but being able to do it directly would be much
 nicer).

 Lastly, if doing physical operations like power maintenance or moving
 racks around in a datacentre, being able to identify machines in the
 same rack can be super useful for planning, downtime announcements, 
orhttps://plus.google.com/hangouts/_/04919b4400b8c4c5ba706b752610cd433d9acbe1
 host evacuation, and being able to find a specific machine in a DC is
 also important (e.g. what shelf in the rack, what cartridge in a
 chassis).

I agree. However, we should take care not to commit ourselves to 
building a DCIM just yet.


 Back to 'Logical Rack' - you can see then that having a single
 construct to group machines together doesn't really support these use
 cases in a systematic fasion:- Physical rack modelling supports only a
 subset of the location/performance/failure use cases, and Logical rack
 doesn't support them at all: we're missing all the rich data we need
 to aggregate faults rapidly : power, network, air conditioning - and
 these things cover both single machine/groups of machines/racks/rows
 of racks scale (consider a networked PDU with 10 hosts on it - thats a
 fraction of a rack).

 So, what I'm suggesting is that we model the failure and performance
 domains directly, and include location (which is the incremental data
 racks add once failure and performance domains are modelled) too. We
 can separately noodle on 

Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse (now featuring software orchestration)

2013-09-25 Thread Mike Spreitzer
Clint wrote:

 There is a third stealth-objective that CFN has caused to linger in
 Heat. That is packaging cloud applications. By allowing the 100%
 concrete CFN template to stand alone, users can ship the template.
 
 IMO this marrying of software assembly, config, and orchestration is a
 concern unto itself, and best left outside of the core infrastructure
 orchestration system.

I favor separation of concerns.  I do not follow what you are suggesting 
about how to separate these particular concerns.  Can you elaborate?

Clint also wrote:

 A ruby DSL is not something I think is ever going to happen in
 OpenStack.

Ruby is particularly good when the runtime scripting is done through chef 
or puppet, which are based on Ruby.  For example, Weaver supports chef 
based scripting, and integrates in a convenient way.

A distributed system does not all have to be written in the same language.

Thomas wrote:

 I don't fully get this idea of HOT consuming a monolithic model produced 
by
 some compiler - be it Weaver or anything else.
 I thought the goal was to develop HOT in a way that users can actually
 write HOT, as opposed to having to use some compiler to produce some
 useful model.
 So wouldn't it make sense to make sure we add the right concepts to HOT 
to
 make sure we are able to express what we want to express and have things
 like composability, re-use, substitutability?

I am generally suspicious of analogies, but let me offer one here.  In the 
realm of programming languages, many have great features for modularity 
within one source file.  These features are greatly appreciated and used. 
But that does not stop people from wanting to maintain sources factored 
into multiple files.

Back to the world at hand, I do not see a conflict between (1) making a 
language for monoliths with sophisticated internal structure and (2) 
defining one or more languages for non-monolithic sources.

Thomas wrote:
 As said in my comment above, I would like to see us focusing on the
 agreement of one language - HOT - instead of yet another DSL.
 There are things out there that are well established (like chef or 
puppet),
 and HOT should be able to efficiently and intuitively use those things 
and
 orchestrate components built using those things.

Yes, it may be that our best tactic at this point is to allow multiple 
(2), some or all not defined through the OpenStack Foundation, while 
agreeing here on (1).

Thomas wrote:
 Anyway, this might be off the track that was originally discussed in 
this
 thread (i.e. holistic scheduling and so on) ...

We are engaged in a boundary-drawing and relationship-drawing exercise.  I 
brought up this idea of a software orchestration compiler to show why I 
think the software orchestration preparation stage is best done earlier 
rather than later.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse

2013-09-25 Thread Mike Spreitzer
Debo, Yathi: I have read 
https://docs.google.com/document/d/1IiPI0sfaWb1bdYiMWzAAx0HYR6UqzOan_Utgml5W1HI/edit?pli=1
 
and most of the referenced materials, and I have a couple of big-picture 
questions.  That document talks about making Nova call out to something 
that makes the sort of smart decisions you and I favor.  As far as I know, 
Nova is still scheduling one thing at a time.  How does that smart 
decision maker get a look at the whole pattern/termplate/topology as soon 
as it is needed?  I think you intend the smart guy gets it first, before 
Nova starts getting individual VM calls, right?  How does this picture 
grow to the point where the smart guy is making joint decisions about 
compute, storage, and network?  I think the key idea has to be that the 
smart guy gets a look at the whole problem first, and makes its decisions, 
before any individual resources are requested from 
nova/cinder/neutron/etc.  I think your point about non-disruptive, works 
with the current nova architecture is about solving the problem of how 
the smart guy's decisions get into nova.  Presumably this problem will 
occur for cinder and so on, too.  Have I got this right?

There is another way, right?  Today Nova accepts an 'availability zone' 
argument whose value can specify a particular host.  I am not sure about 
Cinder, but you can abuse volume types to get this job done.

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Basic questions about climate

2013-09-24 Thread Mike Spreitzer
Climate is about reserving resources.  Are those physical resources or 
virtual ones?  Where was I supposed to read the answer to basic questions 
like that?

If climate is about reserving virtual resources, how is that different 
from scheduling them?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse (now featuring software orchestration)

2013-09-24 Thread Mike Spreitzer
Let me elaborate a little on my thoughts about software orchestration, and 
respond to the recent mails from Zane and Debo.  I have expanded my 
picture at 
https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U 
and added a companion picture at 
https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g 
that shows an alternative.

One of the things I see going on is discussion about better techniques for 
software orchestration than are supported in plain CFN.  Plain CFN allows 
any script you want in userdata, and prescription of certain additional 
setup elsewhere in cfn metadata.  But it is all mixed together and very 
concrete.  I think many contributors would like to see something with more 
abstraction boundaries, not only within one template but also the ability 
to have modular sources.

I work closely with some colleagues who have a particular software 
orchestration technology they call Weaver.  It takes as input for one 
deployment not a single monolithic template but rather a collection of 
modules.  Like higher level constructs in programming languages, these 
have some independence and can be re-used in various combinations and 
ways.  Weaver has a compiler that weaves together the given modules to 
form a monolithic model.  In fact, the input is a modular Ruby program, 
and the Weaver compiler is essentially running that Ruby program; this 
program produces the monolithic model as a side effect.  Ruby is a pretty 
good language in which to embed a domain-specific language, and my 
colleagues have done this.  The modular Weaver input mostly looks 
declarative, but you can use Ruby to reduce the verboseness of, e.g., 
repetitive stuff --- as well as plain old modularity with abstraction.  We 
think the modular Weaver input is much more compact and better for human 
reading and writing than plain old CFN.  This might not be obvious when 
you are doing the hello world example, but when you get to realistic 
examples it becomes clear.

The Weaver input discusses infrastructure issues, in the rich way Debo and 
I have been advocating, as well as software.  For this reason I describe 
it as an integrated model (integrating software and infrastructure 
issues).  I hope for HOT to evolve to be similarly expressive to the 
monolithic integrated model produced by the Weaver compiler.

In Weaver, as well as in some of the other software orchestration 
technologies being discussed, there is a need for some preparatory work 
before the infrastructure (e.g., VMs) is created.  This preparatory stage 
begins the implementation of the software orchestration abstractions. Here 
is the translation from something more abstract into flat userdata and 
other cfn metadata.  For Weaver, this stage also involves some 
stack-specific setup in a distinct coordination service.  When the VMs 
finally run their userdata, the Weaver-generated scripts there use that 
pre-configured part of the coordination service to interact properly with 
each other.

I think that, to a first-order approximation, the software orchestration 
preparatory stage commutes with holistic infrastructure scheduling.  They 
address independent issues, and can be done in either order.  That is why 
I have added a companion picture; the two pictures show the two orders.

My claim of commutativity is limited, as I and colleagues have 
demonstrated only one of the two orderings; the other is just a matter of 
recent thought.  There could be gotchas lurking in there.

Between the two orderings, I have a preference for the one I first 
mentioned and have experience with actually running.  It has the virtue of 
keeping related things closer together: the software orchestration 
compiler is next to the software orchestration preparatory stage, and the 
holistic infrastructure scheduling is next to the infrastructure 
orchestration.

In response to Debo's remark about flexibility: I am happy to see an 
architecture that allows either ordering if it turns out that they are 
both viable and the community really wants that flexibility.  I am not so 
sure we can totally give up on architecting where things go, but this 
level of flexibility I can understand and get behind (provided it works).

Just as a LP solver is a general utility whose uses do not require 
architecting, I can imagine a higher level utility that solves abstract 
placement problems.  Actually, this is not a matter of imagination.  My 
group has been evolving such a thing for years.  It is now based, as Debo 
recommends, on a very flexible and general optimization algorithm.  But 
the plumbing between it and the rest of the system is significant; I would 
not expect many users to take on that magnitude of task.

I do not really want to get into dogmatic fights over what gets labelled 
heat.  I will leave the questions about which piece goes where in the 
OpenStack programs and projects to those more informed and anointed.  What 
I am trying to 

Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse

2013-09-23 Thread Mike Spreitzer
I was not trying to raise issues of geographic dispersion and other higher 
level structures, I think the issues I am trying to raise are relevant 
even without them.  This is not to deny the importance, or relevance, of 
higher levels of structure.  But I would like to first respond to the 
discussion that I think is relevant even without them.

I think it is valuable for OpenStack to have a place for holistic 
infrastructure scheduling.  I am not the only one to argue for this, but I 
will give some use cases.  Consider Hadoop, which stresses the path 
between Compute and Block Storage.  In the usual way of deploying and 
configuring Hadoop, you want each data node to be using directly attached 
storage.  You could address this by scheduling one of those two services 
first, and then the second with constraints from the first --- but the 
decisions made by the first could paint the second into a corner.  It is 
better to be able to schedule both jointly.  Also consider another 
approach to Hadoop, in which the block storage is provided by a bank of 
storage appliances that is equidistant (in networking terms) from all the 
Compute.  In this case the Storage and Compute scheduling decisions have 
no strong interaction --- but the Compute scheduling can interact with the 
network (you do not want to place Compute in a way that overloads part of 
the network).

Once a holistic infrastructure scheduler has made its decisions, there is 
then a need for infrastructure orchestration.  The infrastructure 
orchestration function is logically downstream from holistic scheduling. I 
do not favor creating a new and alternate way of doing infrastructure 
orchestration in this position.  Rather I think it makes sense to use 
essentially today's heat engine.

Today Heat is the only thing that takes a holistic view of 
patterns/topologies/templates, and there are various pressures to expand 
the mission of Heat.  A marquee expansion is to take on software 
orchestration.  I think holistic infrastructure scheduling should be 
downstream from the preparatory stage of software orchestration (the other 
stage of software orchestration is the run-time action in and supporting 
the resources themselves).  There are other pressures to expand the 
mission of Heat too.  This leads to conflicting usages for the word 
heat: it can mean the infrastructure orchestration function that is the 
main job of today's heat engine, or it can mean the full expanded mission 
(whatever you think that should be).  I have been mainly using heat in 
that latter sense, but I do not really want to argue over naming of bits 
and assemblies of functionality.  Call them whatever you want.  I am more 
interested in getting a useful arrangement of functionality.  I have 
updated my picture at 
https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U 
--- do you agree that the arrangement of functionality makes sense?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse

2013-09-23 Thread Mike Spreitzer
Someone earlier asked for greater clarity about infrastructure 
orchestration, so here is my view.  I see two main issues: (1) deciding 
the order in which to do things, and (2) doing them in an acceptable 
order.  That's an oversimplified wording because, in general, some 
parallelism is possible.  In general, the set of things to do is 
constrained by a partial order --- and that partial order comes from two 
sources.  One is the nature of the downstream APIs.  For examples, you can 
not attach a volume or floating IP address to a VM until after both have 
been created.  The other source of ordering constraints is upstream 
decision makers.  Decisions made upstream are conveyed into today's heat 
engine by data dependencies between resources in a heat template.  The 
heat engine is not making those decisions.  It is not a source of 
important ordering constraints.  When the ordering constraints actually 
allow some parallelism --- they do not specify a total order --- the heat 
engine has freedom in which of that parallelism to exploit vs flatten into 
sequential ordering.  What today's heat engine does is make its available 
choices about that and issue the operations, keeping track of IDs and 
outcomes.  I have been using the term infrastructure orchestration to 
refer to this latter job (issuing infrastructure operations with 
acceptable ordering/parallelism), not the decision-making of upstream 
agents.  This might be confusing; I think the plain English meaning of 
orchestration suggests decision-making as well as execution.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Medium Availability VMs

2013-09-20 Thread Mike Spreitzer
 From: Tim Bell tim.b...@cern.ch
 ...
 Is this something that will be added into OpenStack or made 
 available as open source through something like stackforge ?

I and some others think that the OpenStack architecture should have a 
place for holistic infrastructure scheduling.  I also think this is an 
area where vendors will want to compete; I think my company has some 
pretty good technology for this and will want to sell it for money.  
https://wiki.openstack.org/wiki/Open requires that the free OpenStack 
includes a pretty good implementation of this function too, and I think 
others have some they want to contribute.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse

2013-09-20 Thread Mike Spreitzer
I have written a new outline of my thoughts, you can find it at 
https://docs.google.com/document/d/1RV_kN2Io4dotxZREGEks9DM0Ih_trFZ-PipVDdzxq_E

It is intended to stand up better to independent study.  However, it is 
still just an outline.  I am still learning about stuff going on in 
OpenStack, and am learning and thinking faster than I can write.  Trying 
to figure out how to cope.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Fwd: [Openstack-devel] PGP key signing party during the HK summit

2013-09-20 Thread Mike Spreitzer
What's the threat model here?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] How the autoscale API should control scaling in Heat

2013-09-19 Thread Mike Spreitzer
I'd like to try to summarize this discussion, if nothing else than to see 
whether I have correctly understood it.  There is a lot of consensus, but 
I haven't heard from Adrian Otto since he wrote some objections.  I'll 
focus on trying to describe the consensus; Adrian's concerns are already 
collected in a single message.  Or maybe this is already written in some 
one place?

The consensus is that there should be an autoscaling (AS) service that is 
accessible via its own API.  This autoscaling service can scale anything 
describable by a snippet of Heat template (it's not clear to me exactly 
what sort of syntax this is; is it written up anywhere?).  The autoscaling 
service is stimulated into action by a webhook call.  The user has the 
freedom to arrange calls on that webhook in any way she wants.  It is 
anticipated that a common case will be alarms raised by Ceilometer.  For 
more specialized or complicated logic, the user is free to wire up 
anything she wants to call the webhook.

An instance of the autoscaling service maintains an integer variable, 
which is the current number of copies of the thing being autoscaled.  Does 
the webhook call provide a new number, or +1/-1 signal, or ...?

There was some discussion of a way to indicate which individuals to 
remove, in the case of decreasing the multiplier.  I suppose that would be 
an option in the webhook, and one that will not be exercised by Ceilometer 
alarms.

(It seems to me that there is not much auto in this autoscaling service 
--- it is really a scaling service driven by an external controller.  This 
is not a criticism, I think this is a good factoring --- but maybe not the 
best naming.)

The autoscaling service does its job by multiplying the heat template 
snippet (the thing to be autoscaled) by the current number of copies and 
passing this derived template to Heat to make it so.  As the desired 
number of copies changes, the AS service changes the derived template that 
it hands to Heat.  Most commentators argue that the consistency and 
non-redundancy of making the AS service use Heat outweigh the extra 
path-length compared to a more direct solution.

Heat will have a resource type, analogous to 
AWS::AutoScaling::AutoScalingGroup, through which the template author can 
request usage of the AS service.

OpenStack in general, and Heat in particular, need to be much better at 
traceability and debuggability; the AS service should be good at these 
too.

Have I got this right?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] How the autoscale API should control scaling in Heat

2013-09-19 Thread Mike Spreitzer
radix, thanks.  How exactly does the cooldown work?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Medium Availability VMs

2013-09-19 Thread Mike Spreitzer
 From: Tim Bell tim.b...@cern.ch
 ...
 Discussing with various people in the community, there seems to be 
 interest in a way to
 
 -  Identify when a hypervisor is being drained or is down 
 and inventory its VMs
 -  Find the best practise way of restarting that VM for 
 hypervisors still available
 o   Live migration
 o   Cold migration
 -  Defining policies for the remaining cases
 o   Restart from base image
 o   Suspend
 o   Delete
 
 This touches multiple components from Nova/Cinder/Quantum (at minimum).
 
 It also touches some cloud architecture questions if OpenStack can 
 start to move into the low hanging fruit parts of service consolidation.
 
 I’d like to have some form of summit discussion in Hong Kong around 
 these topics but it is not clear where it fits.
 
 Are there others who feel similarly ? How can we fit it in ?

When there are multiple viable choices, I think direction should be taken 
from higher layers.  The operation of draining a hypervisor can be 
parameterized, the VMs themselves can be tagged, by an indication of which 
to do.

I myself am working primarily on holistic infrastructure scheduling, which 
includes quiescing and draining hypervisors among the things it can do. 
Holistic scheduling works under the direction of a 
template/pattern/topology that describes a set of interacting resources 
and their relationships, and so is able to make a good decision about 
where VMs should move to.

Re-starting a VM can require software coordination.

I think holistic infrastructure scheduling is logically downstream from 
software coordination and upstream from infrastructure orchestration.  I 
think the ambitions for Heat are expanding to include the latter two, and 
so must also have something to do with holistic infrastructure scheduling.

Regards,
Mike
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse

2013-09-18 Thread Mike Spreitzer
Manuel, and others:
I am sorry, in the rush at the end of the scheduler meeting a critical 
fact flew from my mind: the material I distributed beforehand was intended 
as something I could reference during discussion in the meeting, I did not 
expect it to fully stand on its own.  Indeed, you have noticed that it 
does not.  It will take a little more time to write something that stands 
on its own.  I will try to get something out soon, including answers to 
your questions.

I should also make clear the overall sense of what I am doing.  I am in an 
in-between state.  My group has some running code on which I can report, 
but we are not satisfied with it for a few reasons.  One is that it is not 
integrated yet in any way with Heat, and I think the discussion we are 
having here overlaps with Heat.  Another is that it does not support very 
general changes, we have so far been solving initial deployment issues. We 
have been thinking about how to do better on these issues, and have an 
outline and are proceeding with the work; I can report on these too.  The 
things that concern me the most are issues of how to get architectural 
alignment with what the OpenStack community is doing.  So my main aim 
right now is to have a discussion of how the pieces fit together.  I am 
told that the OpenStack community likes to focus on small incremental 
changes, and that is a way to get things done, but I, at least, would like 
to get some sense of where this is going.

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [heat] cross-stack references

2013-09-18 Thread Mike Spreitzer
When we get into things like affinity concerns or managing network 
bandwidth, we see the need for cross-stack relationships.  You may want to 
place parts of a new stack near parts of an existing one, for example.  I 
see that in CFN you can make cross-references between different parts of a 
single stack using the resource names that appear in the original 
template.  Is there a way to refer to something that did not come from the 
same original template?  If not, won't we need such a thing to be 
introduced?  Any thoughts on how that would be done?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] cross-stack references

2013-09-18 Thread Mike Spreitzer
My question is about stacks that are not nested.  Suppose, for example, 
that I create a stack that implements a shared service.  Later I create a 
separate stack that uses that shared service.  When creating that client 
stack, I would like to have a way of talking about its relationships with 
the service stack.

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse

2013-09-17 Thread Mike Spreitzer
Fixed, sorry.




From:   Gary Kotton gkot...@vmware.com
To: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org, 
Date:   09/17/2013 03:26 AM
Subject:Re: [openstack-dev] [heat] [scheduler] Bringing things 
together for Icehouse



Hi,
The document is locked.
Thanks
Gary

From: Mike Spreitzer mspre...@us.ibm.com
Reply-To: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org
Date: Tuesday, September 17, 2013 8:00 AM
To: OpenStack Development Mailing List openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [heat] [scheduler] Bringing things together 
for Icehouse

I have written a brief document, with pictures.  See 
https://docs.google.com/document/d/1hQQGHId-z1A5LOipnBXFhsU3VAMQdSe-UXvL4VPY4ps


Regards, 
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Change email address

2013-09-17 Thread Mike Spreitzer
Is it possible to change the email address I use in git and gerrit?  I 
think I started off with an inferior choice.  I have now taught LaunchPad 
and Gerrit that I have two email addresses.  The OpenStack Foundation 
appears a bit confused, but I'm hoping that's not critical.

I am stuck at the point on 
https://wiki.openstack.org/wiki/How_To_Contribute where it says, 
concerning signing the ICLA, Your full name and E-mail address will be 
public (...) and the latter needs to match the user.email in your Git 
configuration.  Gerrit knows that I have signed the ICLA, and will not 
let me sign it again (I can not even try, it is grayed out).  Would it be 
correct to clarify the text I quoted above to say that one of your Gerrit 
email addresses has to match the one in your Git configuration?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Change email address

2013-09-17 Thread Mike Spreitzer
Thanks Anne.  Since I have already signed the ICLA, my real question is 
about what has to be true on an on-going basis for me to do developer 
stuff like reviewing and submitting patches.

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Change email address, or, why I can't use github and will I be able to submit patches?

2013-09-17 Thread Mike Spreitzer
I am working through the instructions at 
https://wiki.openstack.org/wiki/GerritWorkflow - and things are going OK, 
including installing ~/.ssh/id_rsa.pub at 
https://review.openstack.org/#/settings/ssh-keys, without any linebreaks 
in the middle nor at the end - except it fails at the point where I test 
my ability to use github:

mjs9:~ mspreitz$ git config --list
user.name=Mike
user.email=mspre...@us.ibm.com
core.editor=emacs

mjs9:~ mspreitz$ ssh -T g...@github.com
Warning: Permanently added the RSA host key for IP address 
'192.30.252.131' to the list of known hosts.
Permission denied (publickey).


What's going wrong here?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Change email address, or, why I can't use github and will I be able to submit patches?

2013-09-17 Thread Mike Spreitzer
 From: Anne Gentle annegen...@justwriteclick.com
 To: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org, 
 Date: 09/17/2013 05:51 PM
 Subject: Re: [openstack-dev] Change email address, or, why I can't 
 use github and will I be able to submit patches?
 
 ...
 Github was experiencing issues earlier today. Nothing in our 
 GerritWorkflow requires ssh -T g...@github.com though. If you were 
 able to do a git clone, how did git review -s go for you? 

Both work.  So I guess I am in business.

Thanks!
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [savanna] Program name and Mission statement

2013-09-16 Thread Mike Spreitzer
data processing is surely a superset of big data.  Either, by itself, 
is way too vague.  But the wording that many people favor, which I will 
quote again, uses the vague term in a qualified way that makes it 
appropriately specific, IMHO.  Here is the wording again:

``To provide a simple, reliable and repeatable mechanism by which to 
deploy Hadoop and related Big Data projects, including management, 
monitoring and processing mechanisms driving further adoption of 
OpenStack.''

I think that saying related Big Data projects after Hadoop is fairly 
clear.  OTOH, I would not mind replacing Hadoop and related Big Data 
projects with the Hadoop ecosystem.

Regards,
Mike

Matthew Farrellee m...@redhat.com wrote on 09/16/2013 02:39:20 PM:

 From: Matthew Farrellee m...@redhat.com
 To: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org, 
 Date: 09/16/2013 02:40 PM
 Subject: Re: [openstack-dev] [savanna] Program name and Mission 
statement
 
 IMHO, Big Data is even more nebulous and currently being pulled in many 
 directions. Hadoop-as-a-Service may be too narrow. So, something in 
 between, such as Data Processing, is a good balance.
 
 Best,
 
 
 matt
 
 On 09/13/2013 08:37 AM, Abhishek Lahiri wrote:
  IMHO data processing is too board , it makes more sense to clarify 
this
  program as big data as a service or simply 
openstack-Hadoop-as-a-service.
 
  Thanks  Regards
  Abhishek Lahiri
 
  On Sep 12, 2013, at 9:13 PM, Nirmal Ranganathan rnir...@gmail.com
  mailto:rnir...@gmail.com wrote:
 
 
 
 
  On Wed, Sep 11, 2013 at 8:39 AM, Erik Bergenholtz
  ebergenho...@hortonworks.com mailto:ebergenho...@hortonworks.com
  wrote:
 
 
  On Sep 10, 2013, at 8:50 PM, Jon Maron jma...@hortonworks.com
  mailto:jma...@hortonworks.com wrote:
 
  Openstack Big Data Platform
 
 
  On Sep 10, 2013, at 8:39 PM, David Scott
  david.sc...@cloudscaling.com
  mailto:david.sc...@cloudscaling.com wrote:
 
  I vote for 'Open Stack Data'
 
 
  On Tue, Sep 10, 2013 at 5:30 PM, Zhongyue Luo
  zhongyue@intel.com mailto:zhongyue@intel.com wrote:
 
  Why not OpenStack MapReduce? I think that pretty much 
says
  it all?
 
 
  On Wed, Sep 11, 2013 at 3:54 AM, Glen Campbell
  g...@glenc.io mailto:g...@glenc.io wrote:
 
  performant isn't a word. Or, if it is, it means
  having performance. I think you mean 
high-performance.
 
 
  On Tue, Sep 10, 2013 at 8:47 AM, Matthew Farrellee
  m...@redhat.com mailto:m...@redhat.com wrote:
 
  Rough cut -
 
  Program: OpenStack Data Processing
  Mission: To provide the OpenStack community with an
  open, cutting edge, performant and scalable data
  processing stack and associated management 
interfaces.
 
 
  Proposing a slightly different mission:
 
  To provide a simple, reliable and repeatable mechanism by which 
to
  deploy Hadoop and related Big Data projects, including 
management,
  monitoring and processing mechanisms driving further adoption of
  OpenStack.
 
 
 
  +1. I liked the data processing aspect as well, since EDP api 
directly
  relates to that, maybe a combination of both.
 
 
 
  On 09/10/2013 09:26 AM, Sergey Lukjanov wrote:
 
  It sounds too broad IMO. Looks like we need to
  define Mission Statement
  first.
 
  Sincerely yours,
  Sergey Lukjanov
  Savanna Technical Lead
  Mirantis Inc.
 
  On Sep 10, 2013, at 17:09, Alexander Kuznetsov
  akuznet...@mirantis.com
  mailto:akuznet...@mirantis.com
  mailto:akuznetsov@mirantis.__com
  mailto:akuznet...@mirantis.com wrote:
 
  My suggestion OpenStack Data Processing.
 
 
  On Tue, Sep 10, 2013 at 4:15 PM, Sergey 
Lukjanov
  slukja...@mirantis.com
  mailto:slukja...@mirantis.com
  mailto:slukja...@mirantis.com
  mailto:slukja...@mirantis.com__ wrote:
 
  Hi folks,
 
  due to the Incubator Application we
  should prepare Program name
  and Mission statement for Savanna, so, 
I
  want to start mailing
  thread about it.
 
  Please, provide any ideas here.
 
  P.S. List of existing programs:
  https://wiki.openstack.org/__wiki/Programs
  https://wiki.openstack.org/wiki/Programs
  P.P.S.
  

Re: [openstack-dev] [Tuskar] Tuskar Names Clarification Unification

2013-09-16 Thread Mike Spreitzer
 From: Jaromir Coufal jcou...@redhat.com
 To: openstack-dev@lists.openstack.org, 
 Date: 09/16/2013 11:51 AM
 Subject: Re: [openstack-dev] [Tuskar] Tuskar Names Clarification  
Unification
 
 Hi,
 
 after few days of gathering information, it looks that no more new 
 ideas appear there, so let's take the last round of voting for names
 which you prefer. It's important for us to get on the same page.

I am concerned that the proposals around the term 'rack' do not recognize 
that there might be more than one layer in the organization.

Is it more important to get appropriately abstract and generic terms, or 
is the desire to match common concrete terms?

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse

2013-09-16 Thread Mike Spreitzer
I have written a brief document, with pictures.  See 
https://docs.google.com/document/d/1hQQGHId-z1A5LOipnBXFhsU3VAMQdSe-UXvL4VPY4ps

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [heat] [scheduler] Bringing things together for Icehouse

2013-09-15 Thread Mike Spreitzer
I've read up on recent goings-on in the scheduler subgroup, and have some 
thoughts to contribute.

But first I must admit that I am still a newbie to OpenStack, and still am 
missing some important clues.  One thing that mystifies me is this: I see 
essentially the same thing, which I have generally taken to calling 
holistic scheduling, discussed in two mostly separate contexts: (1) the 
(nova) scheduler context, and (2) the ambitions for heat.  What am I 
missing?

I have read the Unified Resource Placement Module document (at 
https://docs.google.com/document/d/1cR3Fw9QPDVnqp4pMSusMwqNuB_6t-t_neFqgXA98-Ls/edit?pli=1#
) and NovaSchedulerPerspective document (at 
https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?pli=1#heading=h.6ixj0ctv4rwu
).  My group already has running code along these lines, and thoughts for 
future improvements, so I'll mention some salient characteristics.  I have 
read the etherpad at 
https://etherpad.openstack.org/IceHouse-Nova-Scheduler-Sessions - and I 
hope my remarks will help fit these topics together.

Our current code uses one long-lived process to make placement decisions. 
The information it needs to do this job is pro-actively maintained in its 
memory.  We are planning to try replacing this one process with a set of 
equivalent processes, not sure how well it will work out (we are a 
research group).

We make a distinction between desired state, target state, and observed 
state.  The desired state comes in through REST requests, each giving a 
full virtual resource topology (VRT).  A VRT includes constraints that 
affect placement, but does not include actual placement decisions.  Those 
are made by what we call the placement agent.  Yes, it is separate from 
orchestration (even in the first architecture figure in the u-rpm document 
the orchestration is separate --- the enclosing box does not abate the 
essential separateness).  In our architecture, orchestration is downstream 
from placement (as in u-rpm).  The placement agent produces target state, 
which is essentially desired state augmented by placement decisions. 
Observed state is what comes from the lower layers (Software Defined 
Compute, Storage, and Network).  We mainly use OpenStack APIs for the 
lower layers, and have added a few local extensions to make the whole 
story work.

The placement agent judges available capacity by subtracting current 
allocations from raw capacity.  The placement agent maintains in its 
memory a derived thing we call effective state; the allocations in 
effective state are the union of the allocations in target state and the 
allocations in observed state.  Since the orchestration is downstream, 
some of the planned allocations are not in observed state yet.  Since 
other actors can use the underlying cloud, and other weird sh*t happens, 
not all the allocations are in target state.  That's why placement is done 
against the union of the allocations.  This is somewhat conservative, but 
the alternatives are worse.

Note that placement is concerned with allocations rather than current 
usage.  Current usage fluctuates much faster than you would want placement 
to.  Placement needs to be done with a long-term perspective.  Of course, 
that perspective can be informed by usage information (as well as other 
sources) --- but it remains a distinct thing.

We consider all our copies of observed state to be soft --- they can be 
lost and reconstructed at any time, because the true source is the 
underlying cloud.  Which is not to say that reconstructing a copy is 
cheap.  We prefer making incremental updates as needed, rather than 
re-reading the whole thing.  One of our local extensions adds a mechanism 
by which a client can register to be notified of changes in the Software 
Defined Compute area.

The target state, on the other hand, is stored authoritatively by the 
placement agent in a database.

We pose placement as a constrained optimization problem, with a non-linear 
objective.  We approximate its solution with a very generic algorithm; it 
is easy to add new kinds of constraints and new contributions to the 
objective.

The core placement problem is about packing virtual resources into 
physical containers (e.g., VMs into hosts, volumes into Cinder backends). 
A virtual resource has a demand vector, and a corresponding container has 
a capacity vector of the same length.  For a given container, the sum of 
the demand vectors of the virtual resources in that container can not 
exceed the container's capacity vector in any dimension.  We can add 
dimensions as needed to handle the relevant host/guest characteristics.

We are just now working an example where a Cinder volume can be required 
to be the only one hosted on whatever Cinder backend hosts it.  This is 
exactly analogous to requiring that a VM (bare metal or otherwise) be the 
only one hosted by whatever PM hosts it.

We favor a fairly expressive language for stating desired 

Re: [openstack-dev] [heat] [scheduler] (How to talk about) Bringing things together for Icehouse

2013-09-15 Thread Mike Spreitzer
As I mentioned the last time this was brought up, I already have a meeting 
series that conflicts with the scheduler group chats and will be hard to 
move; that is why I have been trying to participate asynchronously.  But 
since Gary asked again, I am seeing what I can do about that other meeting 
series.  Unless and until something gives, I will have to continue 
participating asynchronously.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse

2013-09-15 Thread Mike Spreitzer
 From: Gary Kotton gkot...@vmware.com
 ...
 Can you please join us at the up and coming scheduler meeting. That 
 will give you a chance to bring up the idea's and discuss them with 
 a larger audience.

I will do so on Sep 17.  Later meetings still TBD.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [savanna] Host information for non admin users - from a holistic scheduler

2013-09-14 Thread Mike Spreitzer
Alex, my understanding is that the motivation for rack-awareness in Hadoop 
is optimizing availability rather than networking.  The good news, for 
those of us who favor a holistic scheduler, is that it can take both sorts 
of things into account when/where desired.

Yes, the case of a public cloud is more difficult than the case of a 
private cloud.  My understanding of Amazon's attitude, for example, is 
that they do not want to give out any bits of information about placement 
--- even though there are known techniques to reverse-engineer it, Amazon 
does not want to help that along at all.  Giving out obscured information 
--- some bits but not all --- is still disfavored.  Let me give a little 
background on how my group deals with placement for availability, then 
discuss options for the public cloud.

Our holistic scheduler takes as input something we call a virtual resource 
topology (VRT), other people use words like pattern, template, 
application, and cluster for such a thing.  It is an assembly of virtual 
resources that one tenant wants to instantiate.  In a VRT the resources 
are arranged into a tree of groups, the VRT itself is the root.  We use 
the groups for concise statements of various sorts, which I will omit here 
for the sake of simplicity.  As far as direct location constraints are 
concern, there is just one primitive thing: it is a relationship between 
two virtual resources and it is parameterized by a sense (positive or 
negative) and a level in the physical hierarchy (e.g., physical machine 
(PM), chassis, rack).  Thus: a negative relationship between VM1 and VM2 
at the rack level means that VM1 and VM2 must go on different racks; a 
positive relationship between VM3 and VM4 at the PM level means those two 
VMs must be on the same host.  Additionally, each constraint can be hard 
or soft: a hard constraint must be satisfied while a soft constraint is a 
preference.

Consider the example of six interchangeable VMs (say VM1, VM2, ... VM6) 
that should be spread across at least two racks with no more than half the 
VMs on any one rack.  How to say that with a collection of location 
primitives?  What we do is establish three rack-level anti-co-location 
constraints: one between VM1 and VM2, one between VM3 and VM4, and one 
between VM5 and VM6.  That is not the most obvious representation.  You 
might have expected this: nine rack-level anti-co-location constraints, 
one for every pair in the outer product between {VM1, VM2, VM3} and {VM4, 
VM5, VM6}.  Now consider what happens if the physical system has three 
racks and room for only two additional VMs on each rack.  With the latter 
set of constraints, there is no acceptable placement.  With the sparser 
set that we use, there are allowed placements.  In short, an obvious set 
of constraints may rule out otherwise acceptable placement.

I see two ways to give guaranteed-accurate rack awareness to Hadoop: 
constrain the placement so tightly that you know enough to configure 
Hadoop before the placement decision is made, or extract placement 
information after the placement decision is made.  The public cloud 
setting rules out the latter, leaving only the former.  This can be done, 
at a cost of suffering pattern rejections that would not occur if you did 
not have to over-constrain the placement.

One more option is to give up on guaranteed accuracy: prescribe a 
placement with sufficient precision to inform Hadoop, and so inform 
Hadoop, but make that prescription a preference rather than a hard 
constraint.  If the actual placement does not fully meet all the 
preferences, Hadoop is not informed of the differences and so will suffer 
in non-functional ways but still get the job done (modulo all those 
non-functional considerations, like tolerating a rack crash).  When your 
preferences are not met, it is because the system is very loaded and your 
only choice is between operating in some degraded way or not at all --- 
you might as well take the degraded operation.

Regards,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] Comments/questions on the instance-group-api-extension blueprint

2013-09-12 Thread Mike Spreitzer
We are currently explicitly considering location and space.  For example, 
a template can require that a volume be in a disk that is directly 
attached to the machine hosting the VM to which the volume is attached. 
Spinning rust bandwidth is much trickier because it is not something you 
can simply add up when you combine workloads.  The IOPS, as well as the 
B/S, that a disk will deliver depends on the workload mix on that disk. 
While the disk may deliver X IOPS when serving only application A, and Y 
when serving only application B, you cannot conclude that it will serve 
(X+Y)/2 when serving (A+B)/2.  While we hope to do better in the future, 
we currently handle disk bandwidth in non-quantitative ways.  One is that 
a template may request that a volume be placed such that it does not 
compete with any other volume (i.e., is the only one on its disk). Another 
is that a template may specify a type for a volume, which effectively 
maps to a Cinder volume type that has been pre-defined to correspond to a 
QoS defined in an enterprise storage subsystem.

The choice between fastexpensive vs slowcheap storage is currently left 
to higher layers.  That could be pushed down, supposing there is a 
suitably abstract yet accurate way of describing how the tradeoff choice 
should be made.

I think Savanna people are on this list too, so I presume it's a good 
place for this discussion.

Thanks,
Mike



From:   shalz sh...@hotmail.com
To: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org, 
Date:   09/11/2013 09:55 PM
Subject:Re: [openstack-dev] [heat] Comments/questions on the 
instance-group-api-extension blueprint



Mike,

You mention  We are now extending that example to include storage, and we 
are also working examples with Hadoop. 

In the context of your examples / scenarios, do these placement decisions 
consider storage performance and capacity on a physical node?

For example: Based on application needs, and IOPS, latency requirements - 
carving out a SSD storage or a traditional spinning disk block volume?  Or 
say for cost-efficiency reasons using SSD caching on Hadoop name nodes? 

I'm investigating  a) Per node PCIe SSD deployment need in Openstack 
environment /  Hadoop environment and ,b) selected node SSD caching, 
specifically for OpenStack Cinder.  Hope this is the right forum to ask 
this question.

rgds,
S

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] Comments/questions on the instance-group-api-extension blueprint

2013-09-12 Thread Mike Spreitzer
Gary Kotton gkot...@vmware.com wrote on 09/12/2013 05:40:59 AM:

 From: Gary Kotton gkot...@vmware.com
 To: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org, 
 Date: 09/12/2013 05:46 AM
 Subject: Re: [openstack-dev] [heat] Comments/questions on the 
 instance-group-api-extension blueprint
 
 Hi,
 For some reason I am unable to access your proceed talk. I am not 
 100% sure but I think that the voting may be closed. We have weekly 
 scheduling meetings (https://wiki.openstack.org/wiki/
 Meetings#Scheduler_Sub-group_meeting). It would be nice if you could
 attend and it will give you a platform to raise and share ideas with
 the rest of the guys in the community.
 At the moment the scheduling subgroup is working  on our ideas for 
 the design summit sessions. Please see https://
 etherpad.openstack.org/IceHouse-Nova-Scheduler-Sessions
 Thanks
 Gary

Worse yet, I know of no way to navigate to a list of design summit 
proposals.  What am I missing?

The scheduler group meeting conflicts with another meeting that I already 
have and will be difficult to move.  I will see what I can do 
asynchronously.

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [savanna] Host information for non admin users

2013-09-12 Thread Mike Spreitzer
 From: Nirmal Ranganathan rnir...@gmail.com
 ...
 Not host capacity, just a opaque reference to distinguish a host is 
 enough. Hadoop can use that information to appropriately place block
 replicas. For example if the replication count is 3, and if a host/
 rack topology is provided to Hadoop, it will place each replica on a
 different host/rack granted one is available.

What if there are more than three racks, but some are better choices than 
others (perhaps even some are ruled out) due to considerations of various 
sorts of capacity and usage?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [savanna] Program name and Mission statement

2013-09-11 Thread Mike Spreitzer
 To provide a simple, reliable and repeatable mechanism by which to 
 deploy Hadoop and related Big Data projects, including management, 
 monitoring and processing mechanisms driving further adoption of 
OpenStack.

That sounds like it is at about the right level of specificity.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [savanna] Program name and Mission statement

2013-09-10 Thread Mike Spreitzer
A quick dictionary lookup of data processing yields the following.  I 
wonder if you mean something more specific.

data processing |ˈˌdædə ˈprɑsɛsɪŋ|
noun
a series of operations on data, esp. by a computer, to retrieve, 
transform, or classify information.



From:   Matthew Farrellee m...@redhat.com
To: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org, 
Date:   09/10/2013 09:53 AM
Subject:Re: [openstack-dev] [savanna] Program name and Mission 
statement



Rough cut -

Program: OpenStack Data Processing
Mission: To provide the OpenStack community with an open, cutting edge, 
performant and scalable data processing stack and associated management 
interfaces.

On 09/10/2013 09:26 AM, Sergey Lukjanov wrote:
 It sounds too broad IMO. Looks like we need to define Mission Statement
 first.

 Sincerely yours,
 Sergey Lukjanov
 Savanna Technical Lead
 Mirantis Inc.

 On Sep 10, 2013, at 17:09, Alexander Kuznetsov akuznet...@mirantis.com
 mailto:akuznet...@mirantis.com wrote:

 My suggestion OpenStack Data Processing.


 On Tue, Sep 10, 2013 at 4:15 PM, Sergey Lukjanov
 slukja...@mirantis.com mailto:slukja...@mirantis.com wrote:

 Hi folks,

 due to the Incubator Application we should prepare Program name
 and Mission statement for Savanna, so, I want to start mailing
 thread about it.

 Please, provide any ideas here.

 P.S. List of existing programs:
 https://wiki.openstack.org/wiki/Programs
 P.P.S. https://wiki.openstack.org/wiki/Governance/NewPrograms

 Sincerely yours,
 Sergey Lukjanov
 Savanna Technical Lead
 Mirantis Inc.


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 mailto:OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 mailto:OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [heat] Comments/questions on the instance-group-api-extension blueprint

2013-09-10 Thread Mike Spreitzer
First, I'm a newbie here, wondering: is this the right place for 
comments/questions on blueprints?  Supposing it is...

I am referring to 
https://blueprints.launchpad.net/nova/+spec/instance-group-api-extension

In my own research group we have experience with a few systems that do 
something like that, and more (as, indeed, that blueprint explicitly 
states that it is only the start of a longer roadmap).  I would like to 
highlight a couple of differences that alarm me.  One is the general 
overlap between groups.  I am not saying this is wrong, but as a matter of 
natural conservatism we have shied away from unnecessary complexities. The 
only overlap we have done so far is hierarchical nesting.  As the 
instance-group-api-extension explicitly contemplates groups of groups as a 
later development, this would cover the overlap that we have needed.  On 
the other hand, we already have multiple policies attached to a single 
group.  We have policies for a variety of concerns, so some can combine 
completely or somewhat independently.  We also have relationships (of 
various sorts) between groups (as well as between individuals, and between 
individuals and groups).  The policies and relationships, in general, are 
not simply names but also have parameters.

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [savanna] Program name and Mission statement

2013-09-10 Thread Mike Spreitzer
Jon Maron jma...@hortonworks.com wrote on 09/10/2013 08:50:23 PM:

 From: Jon Maron jma...@hortonworks.com
 To: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org, 
 Cc: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org
 Date: 09/10/2013 08:55 PM
 Subject: Re: [openstack-dev] [savanna] Program name and Mission 
statement
 
 Openstack Big Data Platform

Let's see if you mean that.  Does this project aim to cover big data 
things besides MapReduce?  Can you give examples of other things that are 
in scope?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Scaling of TripleO

2013-09-09 Thread Mike Spreitzer
Robert Collins robe...@robertcollins.net wrote on 09/06/2013 05:31:14 
PM:

 From: Robert Collins robe...@robertcollins.net
 To: OpenStack Development Mailing List 
openstack-dev@lists.openstack.org, 
 Date: 09/06/2013 05:36 PM
 Subject: Re: [openstack-dev] [tripleo] Scaling of TripleO
 
 ...
 My vision for TripleO/undercloud and scale in the long term is:
 - A fully redundant self-healing undercloud
   - (implies self hosting)
...

Robert, what do you mean by self hosting?  If a cloud can self-host, why 
do we need two clouds (under and over)?

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Frustrations with review wait times

2013-08-27 Thread Mike Spreitzer
Joshua, I do not think such a strict and coarse scheduling is a practical 
way to manage developers, who have highly individualized talents, 
backgrounds, and interests.

Regards,
Mike

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Stats on blueprint design info / creation times

2013-08-21 Thread Mike Spreitzer
For the case of an item that has no significant doc of its own but is 
related to an extensive blueprint, how about linking to that extensive 
blueprint?
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


<    1   2   3