https://etherpad.openstack.org/p/SchedulerUseCases
[08:43:35] <n0ano> #action all update the use case etherpad athttps://etherpad.openstack.org/p/SchedulerUseCases Please update your use cases here ...... debo On Mon, Jul 14, 2014 at 7:25 PM, Yathiraj Udupi (yudupi) <yud...@cisco.com> wrote: > Hi all, > > Adding to the interesting discussion thread regarding the scheduler split > and its importance, I would like to pitch in a couple of thoughts in favor > of Gantt. It was in the Icehouse summit in HKG in one of the scheduler > design sessions, I along with a few others (cc’d) pitched a session on Smart > Resource Placement > (https://etherpad.openstack.org/p/NovaIcehouse-Smart-Resource-Placement), > where we pitched for a Smart Placement Decision Engine as a Service , > addressing cross-service scheduling as one of the use cases. We pitched the > idea as to how a stand-alone service can act as a smart resource placement > engine, (see figure: > https://docs.google.com/drawings/d/1BgK1q7gl5nkKWy3zLkP1t_SNmjl6nh66S0jHdP0-zbY/edit?pli=1) > that can use state data from all the services, and make a unified placement > decision. We even have proposed a separate blueprint > (https://blueprints.launchpad.net/nova/+spec/solver-scheduler with working > code now here: https://github.com/CiscoSystems/nova-solver-scheduler) called > Smart Scheduler (Solver Scheduler), which has the goals of being able to do > smart resource placement taking into account complex constraints > incorporating compute(nova), storage(cinder), and network constraints. The > existing Filter Scheduler or the projects like Smart (Solver) Scheduler (for > covering the complex constraints scenarios) could easily fulfill the > decision making aspects of the placement engine. > > I believe the Gantt project is the right direction in terms of separating > out the placement decision concern, and creating a separate scheduler as a > service, so that it can freely talk to any of the other services, or use a > unified global state repository and make the unified decision. Projects > like Smart(Solver) Scheduler can easily fit into the Gantt Project as > pluggable drivers to add the additional smarts required. > > To make our Smart Scheduler as a service, we currently have prototyped this > Scheduler as a service providing a RESTful interface to the smart scheduler, > that is detached from Nova (loosely connected): > For example a RESTful request like this (where I am requests for 2 Vms, with > a requirement of 1 GB disk, and another request for 1 Vm of flavor > ‘m1.tiny’, but also has a special requirement that it should be close to the > volume with uuid: “ef6348300bc511e4bc4cc03fd564d1bc" (Compute-Volume > affinity constraint)) : > > > curl -i -H "Content-Type: application/json" -X POST -d > '{"instance_requests": [{"num_instances": 2, "request_properties": > {"instance_type": {"root_gb": 1}}}, {"num_instances": 1, > "request_properties": {"flavor": "m1.tiny”, “volume_affinity": > "ef6348300bc511e4bc4cc03fd564d1bc"}}]}' > http://<x.x.x.x>/smart-scheduler-as-a-service/v1.0/placement > > > provides a placement decision something like this: > > { > > "result": [ > > [ > > { > > "host": { > > "host": "Host1", > > "nodename": "Node1" > > }, > > "instance_uuid": "VM_ID_0_0" > > }, > > { > > "host": { > > "host": "Host2", > > "nodename": "Node2" > > }, > > "instance_uuid": "VM_ID_0_1" > > } > > ], > > [ > > { > > "host": { > > "host": "Host1", > > "nodename": "Node1" > > }, > > "instance_uuid": "VM_ID_1_0" > > } > > ] > > ] > > } > > > This placement result can be used by Nova to proceed and complete the > scheduling. > > > This is where I see the potential for Gantt, which will be a stand alone > placement decision engine, and can easily accommodate different pluggable > engines (such as Smart Scheduler > (https://blueprints.launchpad.net/nova/+spec/solver-scheduler)) to do smart > placement decisions. > > > Pointers: > Smart Resource Placement overview: > https://docs.google.com/document/d/1IiPI0sfaWb1bdYiMWzAAx0HYR6UqzOan_Utgml5W1HI/edit?pli=1 > Figure: > https://docs.google.com/drawings/d/1BgK1q7gl5nkKWy3zLkP1t_SNmjl6nh66S0jHdP0-zbY/edit?pli=1 > Nova Design Session Etherpad: > https://etherpad.openstack.org/p/NovaIcehouse-Smart-Resource-Placement > https://etherpad.openstack.org/p/IceHouse-Nova-Scheduler-Sessions > Smart Scheduler Blueprint: > https://blueprints.launchpad.net/nova/+spec/solver-scheduler > Working code: https://github.com/CiscoSystems/nova-solver-scheduler > > > Thanks, > > Yathi. > > > > > > > On 7/14/14, 1:40 PM, "Murray, Paul (HP Cloud)" <pmur...@hp.com> wrote: > > Hi All, > > > > I’m sorry I am so late to this lively discussion – it looks a good one! Jay > has been driving the debate a bit so most of this is in response to his > comments. But please, anyone should chip in. > > > > On extensible resource tracking > > > > Jay, I am surprised to hear you say no one has explained to you why there is > an extensible resource tracking blueprint. It’s simple, there was a > succession of blueprints wanting to add data about this and that to the > resource tracker and the scheduler and the database tables used to > communicate. These included capabilities, all the stuff in the stats, > rxtx_factor, the equivalent for cpu (which only works on one hypervisor I > think), pci_stats and more were coming including, > > > > https://blueprints.launchpad.net/nova/+spec/network-bandwidth-entitlement > > https://blueprints.launchpad.net/nova/+spec/cpu-entitlement > > > > So, in short, your claim that there are no operators asking for additional > stuff is simply not true. > > > > Around about the Icehouse summit (I think) it was suggested that we should > stop the obvious trend and add a way to make resource tracking extensible, > similar to metrics, which had just been added as an extensible way of > collecting on going usage data (because that was also wanted). > > > > The json blob you refer to was down to the bad experience of the > compute_node_stats table implemented for stats – which had a particular > performance hit because it required an expensive join. This was dealt with > by removing the table and adding a string field to contain the data as a > json blob. A pure performance optimization. Clearly there is no need to > store things in this way and with Nova objects being introduced there is a > means to provide strict type checking on the data even if it is stored as > json blobs in the database. > > > > On scheduler split > > > > I have no particular position on splitting the scheduler. However, there was > an interesting reaction to the network bandwidth entitlement blueprint > listed above. The nova community felt it was a network thing and so nova > should not provide it – neutron should. Of course, in nova, the nova > scheduler makes placement decisions… can you see where this is going…? Nova > needs to coordinate its placement decision with neutron to decide if a host > has sufficient bandwidth available. Similar points are made about cinder – > nova has no idea about cinder, but in some environments the location of a > volume matters when you come to place an instance. > > > > I should re-iterate that I have no position on splitting out the scheduler, > but some way to deal with information from outside nova is certainly > desirable. Maybe other services have the same dilemma. > > > > On global resource tracker > > > > I have to say I am inclined to be against the idea of turning the scheduler > into a “global resource tracker”. I do see the benefit of obtaining a > resource claim up front, we have all seen that the scheduler can make > incorrect choices because of the delay in reflecting resource allocation to > the database and so to the scheduler – it operates on imperfect information. > However, it is best to avoid a global service relying on synchronous > interaction with compute nodes during the process of servicing a request. I > have looked at your example code for the scheduler (global resource tracker) > and it seems to make a choice from local information and then interact with > the chosen compute node to obtain a claim and then try again if the claim > fails. I get it – I see that it deals with the same list of hosts on the > retry. I also see it has no better chance of getting it right. > > > > Your desire to have a claim is borne out by the persistent claims spec (I > love the spec, I really I don’t see why they have to be persistent). I think > that is a great idea. Why not let the scheduler make placement suggestions > (as a global service) and then allow conductors to obtain the claim and > retry if the claim fails? Similar process to your code, but the scheduler > only does its part and the conductors scale out the process by acting more > locally and with more parallelism. (Of course, you could also be optimistic > and allow the compute node to do the claim as part of the create as the > degenerate case). > > > > To emphasize the point further, what would a cells scheduler do? Would that > also make a synchronous operation to obtain the claim? > > > > My reaction to the global resource tracker idea has been quite negative. I > want to like the idea because I like the thought of knowing I have the > resources when I get my answer. Its just that I think the persistent claims > (without the persistent part J ) gives us a lot of what we need. But I am > still open to be convinced. > > > > Paul > > > > > > > > On 07/14/2014 10:16 AM, Sylvain Bauza wrote: > >> Le 12/07/2014 06:07, Jay Pipes a écrit : > >>> On 07/11/2014 07:14 AM, John Garbutt wrote: > >>>> On 10 July 2014 16:59, Sylvain Bauza <sbauza at redhat.com> wrote: > >>>>> Le 10/07/2014 15:47, Russell Bryant a écrit : > >>>>>> On 07/10/2014 05:06 AM, Sylvain Bauza wrote: > >>>>>>> Hi all, > >>>>>>> > >>>>>>> === tl;dr: Now that we agree on waiting for the split > >>>>>>> prereqs to be done, we debate on if ResourceTracker should > >>>>>>> be part of the scheduler code and consequently Scheduler > >>>>>>> should expose ResourceTracker APIs so that Nova wouldn't > >>>>>>> own compute nodes resources. I'm proposing to first come > >>>>>>> with RT as Nova resource in Juno and move ResourceTracker > >>>>>>> in Scheduler for K, so we at least merge some patches by > >>>>>>> Juno. === > >>>>>>> > >>>>>>> Some debates occured recently about the scheduler split, so > >>>>>>> I think it's important to loop back with you all to see > >>>>>>> where we are and what are the discussions. Again, feel free > >>>>>>> to express your opinions, they are welcome. > >>>>>> Where did this resource tracker discussion come up? Do you > >>>>>> have any references that I can read to catch up on it? I > >>>>>> would like to see more detail on the proposal for what should > >>>>>> stay in Nova vs. be moved. What is the interface between > >>>>>> Nova and the scheduler here? > >>>>> > >>>>> Oh, missed the most important question you asked. So, about > >>>>> the interface in between scheduler and Nova, the original > >>>>> agreed proposal is in the spec > >>>>> https://review.openstack.org/82133 (approved) where the > >>>>> Scheduler exposes : - select_destinations() : for querying the > >>>>> scheduler to provide candidates - update_resource_stats() : for > >>>>> updating the scheduler internal state (ie. HostState) > >>>>> > >>>>> Here, update_resource_stats() is called by the > >>>>> ResourceTracker, see the implementations (in review) > >>>>> https://review.openstack.org/82778 and > >>>>> https://review.openstack.org/104556. > >>>>> > >>>>> The alternative that has just been raised this week is to > >>>>> provide a new interface where ComputeNode claims for resources > >>>>> and frees these resources, so that all the resources are fully > >>>>> owned by the Scheduler. An initial PoC has been raised here > >>>>> https://review.openstack.org/103598 but I tried to see what > >>>>> would be a ResourceTracker proxified by a Scheduler client here > >>>>> : https://review.openstack.org/105747. As the spec hasn't been > >>>>> written, the names of the interfaces are not properly defined > >>>>> but I made a proposal as : - select_destinations() : same as > >>>>> above - usage_claim() : claim a resource amount - > >>>>> usage_update() : update a resource amount - usage_drop(): frees > >>>>> the resource amount > >>>>> > >>>>> Again, this is a dummy proposal, a spec has to written if we > >>>>> consider moving the RT. > >>>> > >>>> While I am not against moving the resource tracker, I feel we > >>>> could move this to Gantt after the core scheduling has been > >>>> moved. > >>> > >>> Big -1 from me on this, John. > >>> > >>> Frankly, I see no urgency whatsoever -- and actually very little > >>> benefit -- to moving the scheduler out of Nova. The Gantt project I > >>> think is getting ahead of itself by focusing on a split instead of > >>> focusing on cleaning up the interfaces between nova-conductor, > >>> nova-scheduler, and nova-compute. > >>> > >> > >> -1 on saying there is no urgency. Don't you see the NFV group saying > >> each meeting what is the status of the scheduler split ? > > > > Frankly, I don't think a lot of the NFV use cases are well-defined. > > > > Even more frankly, I don't see any benefit to a split-out scheduler to a > > single NFV use case. > > > >> Don't you see each Summit the lots of talks (and people attending > >> them) talking about how OpenStack should look at Pets vs. Cattle and > >> saying that the scheduler should be out of Nova ? > > > > There's been no concrete benefits discussed to having the scheduler > > outside of Nova. > > > > I don't really care how many people say that the scheduler should be out > > of Nova unless those same people come to the table with concrete reasons > > why. Just saying something is a benefit does not make it a benefit, and > > I think I've outlined some of the very real dangers -- in terms of code > > and payload complexity -- of breaking the scheduler out of Nova until > > the interfaces are cleaned up and the scheduler actually owns the > > resources upon which it exercises placement decisions. > > > >> From an operator perspective, people waited so long for having a > >> scheduler doing "scheduling" and not only "resource placement". > > > > Could you elaborate a bit here? What operators are begging for the > > scheduler to do more than resource placement? And if they are begging > > for this, what use cases are they trying to address? > > > > I'm genuinely curious, so looking forward to your reply here! :) > > > > snip... > > > >>> As for the idea that things will get *easier* once scheduler code > >>> is broken out of Nova, I go back to my original statement that I > >>> don't really see the benefit of the split at this point, and I > >>> would just bring up the fact that Neutron/nova-network is a shining > >>> example of how things can easily backfire when splitting of code is > >>> done too early before interfaces are cleaned up and > >>> responsibilities between internal components are not clearly agreed > >>> upon. > >> > >> Please, please, don't mix the rationale for extensible Resource > >> Tracker and the current efforts for moving out the Scheduler. Both of > >> them try to have an agnostic and heterogeneous scheduler, but both > >> efforts are independent. > >> > >> The ResourceTracker is something pure Nova. Saying to Gantt "I want > >> to store this data" and "I want you to select a destination" is > >> something enough agnostic for not including the port of > >> ResourceTracker to the Scheduler. > > > > Sorry, I'm not following you. Who is saying to Gantt "I want to store > > this data"? > > > > All I am saying is that the thing that places a resource on some > > provider of that resource should be the thing that owns the process of a > > requester *claiming* the resources on that provider, and in order to > > properly track resources in a race-free way in such a system, then the > > system needs to contain the resource tracker. > > > >> While I approve to define the interfaces now, there is no reason tho > >> to say we would have to change anything in how Nova is doing that. > >> The role of Gantt is to define the interfaces, make the line > >> Scheduler vs. Nova and forklift the Scheduler into a single project. > >> No big bang is needed here. > > > > Yeah, I just don't see the need to split the scheduler at this point, > > sorry. :( > > > > Best, > > -jay -- -Debo~ _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev