[openstack-dev] [Nova] [Gantt][Scheduler-split] Why we need a Smart Placement Engine as a Service! (was: Scheduler split status (updated))

Yathiraj Udupi (yudupi) Mon, 14 Jul 2014 19:27:04 -0700

Hi all,

Adding to the interesting discussion thread regarding the scheduler split and 
its importance, I would like to pitch in a couple of thoughts in favor of 
Gantt.  It was in the Icehouse summit in HKG in one of the scheduler design 
sessions, I along with a few others (cc’d) pitched a session on Smart Resource 
Placement 
(https://etherpad.openstack.org/p/NovaIcehouse-Smart-Resource-Placement), where 
we pitched for a  Smart Placement Decision Engine  as a Service , addressing 
cross-service scheduling as one of the use cases.  We pitched the idea as to 
how a stand-alone service can act as a  smart resource placement engine, (see 
figure: 
https://docs.google.com/drawings/d/1BgK1q7gl5nkKWy3zLkP1t_SNmjl6nh66S0jHdP0-zbY/edit?pli=1)
 that can use state data from all the services, and make a unified placement 
decision.   We even have proposed a separate blueprint 
(https://blueprints.launchpad.net/nova/+spec/solver-scheduler with working code 
now here: https://github.com/CiscoSystems/nova-solver-scheduler) called Smart 
Scheduler (Solver Scheduler), which has the goals of being able to do smart 
resource placement taking into account complex constraints incorporating 
compute(nova), storage(cinder), and network constraints.   The existing Filter 
Scheduler or the projects like Smart (Solver) Scheduler (for covering the 
complex constraints scenarios) could easily fulfill the decision making aspects 
of the placement engine.

I believe the Gantt project is the right direction in terms of separating out 
the placement decision concern, and creating a separate scheduler as a service, 
so that it can freely talk to any of the other services, or use a unified 
global state repository and make the unified decision.  Projects like 
Smart(Solver) Scheduler can easily fit into the Gantt Project as pluggable 
drivers to add the additional smarts required.

To make our Smart Scheduler as a service, we currently have prototyped this 
Scheduler as a service providing a RESTful interface to the smart scheduler, 
that is detached from Nova (loosely connected):
For example a RESTful request like this (where I am requests for 2 Vms, with a 
requirement of 1 GB disk, and another request for 1 Vm of flavor ‘m1.tiny’, but 
also has a special requirement that it should be close to the volume with uuid: 
“ef6348300bc511e4bc4cc03fd564d1bc" (Compute-Volume affinity constraint)) :

curl -i -H "Content-Type: application/json" -X POST -d '{"instance_requests": 
[{"num_instances": 2, "request_properties": {"instance_type": {"root_gb": 1}}}, 
{"num_instances": 1, "request_properties": {"flavor": "m1.tiny”, 
“volume_affinity": "ef6348300bc511e4bc4cc03fd564d1bc"}}]}' 
http://<x.x.x.x>/smart-scheduler-as-a-service/v1.0/placement

provides a placement decision something like this:

{

  "result": [

    [

      {

        "host": {

          "host": "Host1",

          "nodename": "Node1"

        },

        "instance_uuid": "VM_ID_0_0"

      },

      {

        "host": {

          "host": "Host2",

          "nodename": "Node2"

        },

        "instance_uuid": "VM_ID_0_1"

      }

    ],

    [

      {

        "host": {

          "host": "Host1",

          "nodename": "Node1"

        },

        "instance_uuid": "VM_ID_1_0"

      }

    ]

  ]

}

This placement result can be used by Nova to proceed and complete the 
scheduling.

This is where I see the potential for Gantt, which will be a stand alone 
placement decision engine, and can easily accommodate different pluggable 
engines (such as Smart Scheduler 
(https://blueprints.launchpad.net/nova/+spec/solver-scheduler))  to do smart 
placement decisions.

Pointers:
Smart Resource Placement overview: 
https://docs.google.com/document/d/1IiPI0sfaWb1bdYiMWzAAx0HYR6UqzOan_Utgml5W1HI/edit?pli=1
Figure: 
https://docs.google.com/drawings/d/1BgK1q7gl5nkKWy3zLkP1t_SNmjl6nh66S0jHdP0-zbY/edit?pli=1
Nova Design Session Etherpad: 
https://etherpad.openstack.org/p/NovaIcehouse-Smart-Resource-Placement
https://etherpad.openstack.org/p/IceHouse-Nova-Scheduler-Sessions
Smart Scheduler Blueprint: 
https://blueprints.launchpad.net/nova/+spec/solver-scheduler
Working code: https://github.com/CiscoSystems/nova-solver-scheduler

Thanks,

Yathi.

On 7/14/14, 1:40 PM, "Murray, Paul (HP Cloud)" 
<[email protected]<mailto:[email protected]>> wrote:

Hi All,

I’m sorry I am so late to this lively discussion – it looks a good one! Jay has 
been driving the debate a bit so most of this is in response to his comments. 
But please, anyone should chip in.

On extensible resource tracking

Jay, I am surprised to hear you say no one has explained to you why there is an 
extensible resource tracking blueprint. It’s simple, there was a succession of 
blueprints wanting to add data about this and that to the resource tracker and 
the scheduler and the database tables used to communicate. These included 
capabilities, all the stuff in the stats, rxtx_factor, the equivalent for cpu 
(which only works on one hypervisor I think), pci_stats and more were coming 
including,

https://blueprints.launchpad.net/nova/+spec/network-bandwidth-entitlement
https://blueprints.launchpad.net/nova/+spec/cpu-entitlement

So, in short, your claim that there are no operators asking for additional 
stuff is simply not true.

Around about the Icehouse summit (I think) it was suggested that we should stop 
the obvious trend and add a way to make resource tracking extensible, similar 
to metrics, which had just been added as an extensible way of collecting on 
going usage data (because that was also wanted).

The json blob you refer to was down to the bad experience of the 
compute_node_stats table implemented for stats – which had a particular 
performance hit because it required an expensive join. This was dealt with by 
removing the table and adding a string field to contain the data as a json 
blob. A pure performance optimization. Clearly there is no need to store things 
in this way and with Nova objects being introduced there is a means to provide 
strict type checking on the data even if it is stored as json blobs in the 
database.

On scheduler split

I have no particular position on splitting the scheduler. However, there was an 
interesting reaction to the network bandwidth entitlement blueprint listed 
above. The nova community felt it was a network thing and so nova should not 
provide it – neutron should. Of course, in nova, the nova scheduler makes 
placement decisions… can you see where this is going…? Nova needs to coordinate 
its placement decision with neutron to decide if a host has sufficient 
bandwidth available. Similar points are made about cinder – nova has no idea 
about cinder, but in some environments the location of a volume matters when 
you come to place an instance.

I should re-iterate that I have no position on splitting out the scheduler, but 
some way to deal with information from outside nova is certainly desirable. 
Maybe other services have the same dilemma.

On global resource tracker

I have to say I am inclined to be against the idea of turning the scheduler 
into a “global resource tracker”. I do see the benefit of obtaining a resource 
claim up front, we have all seen that the scheduler can make incorrect choices 
because of the delay in reflecting resource allocation to the database and so 
to the scheduler – it operates on imperfect information. However, it is best to 
avoid a global service relying on synchronous interaction with compute nodes 
during the process of servicing a request. I have looked at your example code 
for the scheduler (global resource tracker) and it seems to make a choice from 
local information and then interact with the chosen compute node to obtain a 
claim and then try again if the claim fails. I get it – I see that it deals 
with the same list of hosts on the retry. I also see it has no better chance of 
getting it right.

Your desire to have a claim is borne out by the persistent claims spec (I love 
the spec, I really I don’t see why they have to be persistent). I think that is 
a great idea. Why not let the scheduler make placement suggestions (as a global 
service) and then allow conductors to obtain the claim and retry if the claim 
fails? Similar process to your code, but the scheduler only does its part and 
the conductors scale out the process by acting more locally and with more 
parallelism. (Of course, you could also be optimistic and allow the compute 
node to do the claim as part of the create as the degenerate case).

To emphasize the point further, what would a cells scheduler do? Would that 
also make a synchronous operation to obtain the claim?

My reaction to the global resource tracker idea has been quite negative. I want 
to like the idea because I like the thought of knowing I have the resources 
when I get my answer. Its just that I think the persistent claims (without the 
persistent part :) ) gives us a lot of what we need. But I am still open to be 
convinced.

Paul

On 07/14/2014 10:16 AM, Sylvain Bauza wrote:
> Le 12/07/2014 06:07, Jay Pipes a écrit :
>> On 07/11/2014 07:14 AM, John Garbutt wrote:
>>> On 10 July 2014 16:59, Sylvain Bauza <sbauza at redhat.com> wrote:
>>>> Le 10/07/2014 15:47, Russell Bryant a écrit :
>>>>> On 07/10/2014 05:06 AM, Sylvain Bauza wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> === tl;dr: Now that we agree on waiting for the split
>>>>>> prereqs to be done, we debate on if ResourceTracker should
>>>>>> be part of the scheduler code and consequently Scheduler
>>>>>> should expose ResourceTracker APIs so that Nova wouldn't
>>>>>> own compute nodes resources. I'm proposing to first come
>>>>>> with RT as Nova resource in Juno and move ResourceTracker
>>>>>> in Scheduler for K, so we at least merge some patches by
>>>>>> Juno. ===
>>>>>>
>>>>>> Some debates occured recently about the scheduler split, so
>>>>>> I think it's important to loop back with you all to see
>>>>>> where we are and what are the discussions. Again, feel free
>>>>>> to express your opinions, they are welcome.
>>>>> Where did this resource tracker discussion come up?  Do you
>>>>> have any references that I can read to catch up on it?  I
>>>>> would like to see more detail on the proposal for what should
>>>>> stay in Nova vs. be moved.  What is the interface between
>>>>> Nova and the scheduler here?
>>>>
>>>> Oh, missed the most important question you asked. So, about
>>>> the interface in between scheduler and Nova, the original
>>>> agreed proposal is in the spec
>>>> https://review.openstack.org/82133 (approved) where the
>>>> Scheduler exposes : - select_destinations() : for querying the
>>>> scheduler to provide candidates - update_resource_stats() : for
>>>> updating the scheduler internal state (ie. HostState)
>>>>
>>>> Here, update_resource_stats() is called by the
>>>> ResourceTracker, see the implementations (in review)
>>>> https://review.openstack.org/82778 and
>>>> https://review.openstack.org/104556.
>>>>
>>>> The alternative that has just been raised this week is to
>>>> provide a new interface where ComputeNode claims for resources
>>>> and frees these resources, so that all the resources are fully
>>>> owned by the Scheduler. An initial PoC has been raised here
>>>> https://review.openstack.org/103598 but I tried to see what
>>>> would be a ResourceTracker proxified by a Scheduler client here
>>>> : https://review.openstack.org/105747. As the spec hasn't been
>>>> written, the names of the interfaces are not properly defined
>>>> but I made a proposal as : - select_destinations() : same as
>>>> above - usage_claim() : claim a resource amount -
>>>> usage_update() : update a resource amount - usage_drop(): frees
>>>> the resource amount
>>>>
>>>> Again, this is a dummy proposal, a spec has to written if we
>>>> consider moving the RT.
>>>
>>> While I am not against moving the resource tracker, I feel we
>>> could move this to Gantt after the core scheduling has been
>>> moved.
>>
>> Big -1 from me on this, John.
>>
>> Frankly, I see no urgency whatsoever -- and actually very little
>> benefit -- to moving the scheduler out of Nova. The Gantt project I
>> think is getting ahead of itself by focusing on a split instead of
>> focusing on cleaning up the interfaces between nova-conductor,
>> nova-scheduler, and nova-compute.
>>
>
> -1 on saying there is no urgency. Don't you see the NFV group saying
> each meeting what is the status of the scheduler split ?

Frankly, I don't think a lot of the NFV use cases are well-defined.

Even more frankly, I don't see any benefit to a split-out scheduler to a
single NFV use case.

> Don't you see each Summit the lots of talks (and people attending
> them) talking about how OpenStack should look at Pets vs. Cattle and
> saying that the scheduler should be out of Nova ?

There's been no concrete benefits discussed to having the scheduler
outside of Nova.

I don't really care how many people say that the scheduler should be out
of Nova unless those same people come to the table with concrete reasons
why. Just saying something is a benefit does not make it a benefit, and
I think I've outlined some of the very real dangers -- in terms of code
and payload complexity -- of breaking the scheduler out of Nova until
the interfaces are cleaned up and the scheduler actually owns the
resources upon which it exercises placement decisions.

> From an operator perspective, people waited so long for having a
> scheduler doing "scheduling" and not only "resource placement".

Could you elaborate a bit here? What operators are begging for the
scheduler to do more than resource placement? And if they are begging
for this, what use cases are they trying to address?

I'm genuinely curious, so looking forward to your reply here! :)

snip...

>> As for the idea that things will get *easier* once scheduler code
>> is broken out of Nova, I go back to my original statement that I
>> don't really see the benefit of the split at this point, and I
>> would just bring up the fact that Neutron/nova-network is a shining
>> example of how things can easily backfire when splitting of code is
>> done too early before interfaces are cleaned up and
>> responsibilities between internal components are not clearly agreed
>> upon.
>
> Please, please, don't mix the rationale for extensible Resource
> Tracker and the current efforts for moving out the Scheduler. Both of
> them try to have an agnostic and heterogeneous scheduler, but both
> efforts are independent.
>
> The ResourceTracker is something pure Nova. Saying to Gantt "I want
> to store this data" and "I want you to select a destination" is
> something enough agnostic for not including the port of
> ResourceTracker to the Scheduler.

Sorry, I'm not following you. Who is saying to Gantt "I want to store
this data"?

All I am saying is that the thing that places a resource on some
provider of that resource should be the thing that owns the process of a
requester *claiming* the resources on that provider, and in order to
properly track resources in a race-free way in such a system, then the
system needs to contain the resource tracker.

> While I approve to define the interfaces now, there is no reason tho
> to say we would have to change anything in how Nova is doing that.
> The role of Gantt is to define the interfaces, make the line
> Scheduler vs. Nova and forklift the Scheduler into a single project.
> No big bang is needed here.

Yeah, I just don't see the need to split the scheduler at this point,
sorry. :(

Best,
-jay

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Nova] [Gantt][Scheduler-split] Why we need a Smart Placement Engine as a Service! (was: Scheduler split status (updated))

Reply via email to