Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse

Gary Kotton Sun, 15 Sep 2013 05:07:03 -0700

Hi,
Can you please join us at the up and coming scheduler meeting. That will give 
you a chance to bring up the idea's and discuss them with a larger audience.
https://wiki.openstack.org/wiki/Meetings#Scheduler_Sub-group_meeting
I think that for the summit it would be a good idea if we could also have at 
least one session with the Heat folks to see how we can combine efforts.
Thanks
Gary

From: Mike Spreitzer <[email protected]<mailto:[email protected]>>
Reply-To: OpenStack Development Mailing List 
<[email protected]<mailto:[email protected]>>
Date: Sunday, September 15, 2013 10:19 AM
To: OpenStack Development Mailing List 
<[email protected]<mailto:[email protected]>>
Subject: [openstack-dev] [heat] [scheduler] Bringing things together for 
Icehouse

I've read up on recent goings-on in the scheduler subgroup, and have some 
thoughts to contribute.

But first I must admit that I am still a newbie to OpenStack, and still am 
missing some important clues.  One thing that mystifies me is this: I see 
essentially the same thing, which I have generally taken to calling holistic 
scheduling, discussed in two mostly separate contexts: (1) the (nova) scheduler 
context, and (2) the ambitions for heat.  What am I missing?

I have read the Unified Resource Placement Module document (at 
https://docs.google.com/document/d/1cR3Fw9QPDVnqp4pMSusMwqNuB_6t-t_neFqgXA98-Ls/edit?pli=1#)
 and NovaSchedulerPerspective document (at 
https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?pli=1#heading=h.6ixj0ctv4rwu).
  My group already has running code along these lines, and thoughts for future 
improvements, so I'll mention some salient characteristics.  I have read the 
etherpad at https://etherpad.openstack.org/IceHouse-Nova-Scheduler-Sessions - 
and I hope my remarks will help fit these topics together.

Our current code uses one long-lived process to make placement decisions.  The 
information it needs to do this job is pro-actively maintained in its memory.  
We are planning to try replacing this one process with a set of equivalent 
processes, not sure how well it will work out (we are a research group).

We make a distinction between desired state, target state, and observed state.  
The desired state comes in through REST requests, each giving a full virtual 
resource topology (VRT).  A VRT includes constraints that affect placement, but 
does not include actual placement decisions.  Those are made by what we call 
the placement agent.  Yes, it is separate from orchestration (even in the first 
architecture figure in the u-rpm document the orchestration is separate --- the 
enclosing box does not abate the essential separateness).  In our architecture, 
orchestration is downstream from placement (as in u-rpm).  The placement agent 
produces target state, which is essentially desired state augmented by 
placement decisions.  Observed state is what comes from the lower layers 
(Software Defined Compute, Storage, and Network).  We mainly use OpenStack APIs 
for the lower layers, and have added a few local extensions to make the whole 
story work.

The placement agent judges available capacity by subtracting current 
allocations from raw capacity.  The placement agent maintains in its memory a 
derived thing we call effective state; the allocations in effective state are 
the union of the allocations in target state and the allocations in observed 
state.  Since the orchestration is downstream, some of the planned allocations 
are not in observed state yet.  Since other actors can use the underlying 
cloud, and other weird sh*t happens, not all the allocations are in target 
state.  That's why placement is done against the union of the allocations.  
This is somewhat conservative, but the alternatives are worse.

Note that placement is concerned with allocations rather than current usage.  
Current usage fluctuates much faster than you would want placement to.  
Placement needs to be done with a long-term perspective.  Of course, that 
perspective can be informed by usage information (as well as other sources) --- 
but it remains a distinct thing.

We consider all our copies of observed state to be soft --- they can be lost 
and reconstructed at any time, because the true source is the underlying cloud. 
 Which is not to say that reconstructing a copy is cheap.  We prefer making 
incremental updates as needed, rather than re-reading the whole thing.  One of 
our local extensions adds a mechanism by which a client can register to be 
notified of changes in the Software Defined Compute area.

The target state, on the other hand, is stored authoritatively by the placement 
agent in a database.

We pose placement as a constrained optimization problem, with a non-linear 
objective.  We approximate its solution with a very generic algorithm; it is 
easy to add new kinds of constraints and new contributions to the objective.

The core placement problem is about packing virtual resources into physical 
containers (e.g., VMs into hosts, volumes into Cinder backends).  A virtual 
resource has a demand vector, and a corresponding container has a capacity 
vector of the same length.  For a given container, the sum of the demand 
vectors of the virtual resources in that container can not exceed the 
container's capacity vector in any dimension.  We can add dimensions as needed 
to handle the relevant host/guest characteristics.

We are just now working an example where a Cinder volume can be required to be 
the only one hosted on whatever Cinder backend hosts it.  This is exactly 
analogous to requiring that a VM (bare metal or otherwise) be the only one 
hosted by whatever PM hosts it.

We favor a fairly expressive language for stating desired policies and 
relationships in VRTs.  We think this is necessary when you move beyond simple 
examples to more realistic ones.  We do not favor chopping the cloud up into 
little pieces due to inexpressiveness in the VRT language.

Regards,
Mike

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [heat] [scheduler] Bringing things together for Icehouse

Reply via email to