Re: [openstack-dev] [nova] [gantt] Scheduler cleanup - what did we agree to

Sylvain Bauza Mon, 08 Sep 2014 01:36:07 -0700

Hi Don,

Adding [nova] in the subject too, because we could miss some people here.



Le 08/09/2014 07:24, Dugger, Donald D a écrit :

As I mentioned in a prior email I think that, although we’re inagreement on what needs to be done before splitting out the schedulerinto the Gantt project, I believe we have different views on what thatagreement actually is. Given that we have multiple people thatactively want to work on this split I would like to try and put downthe specifics of what needs to be accomplished.
As I see it the top level issue is cleaning up the internal interfacesbetween the Nova core code and the scheduler, specifically:
1)The client interface

a.Done – we’ve created and pushed a patch to address this interface

+1. Scheduler-lib is now merged, but using JSON dicts to pass updates tothe scheduler.The main point of this blueprint is to create a new interface forupdating stats to the Scheduler, as RT was previously directly sendingDB modifications to the conductor (even not yet using objects)

2)Data-base access
a.Ongoing – we’ve created a patch that missed the Juno deadline, tryagain in Kilo

This isolate-scheduler-db blueprint was based on Extensible ResourceTracker (ERT). As ERT is not yet fully merged upstream (the schedulerpart is still on review) and as it's not providing a clear interface forstats (just adding nested dicts to a big JSON string), we decided toreview other opportunities for sending these updates necessary forhaving the filters looking at HostState instead of directly callingother Nova objects.

3)Resource Tracker

a.Identify what data is sent from compute to scheduler

b.Track that data inside the scheduler

c.Not started yet (being discussed)

To be precise, we need to clearly define the interface that theScheduler is exposing. As I said above, there is now another methodcalled update_resource_stats(name, stats) which provided a firstendpoint for sending updates to the Scheduler, but we need to strengthenthis method by having validation and typing here instead of a blob.

On the other hand, we also need to make sure that the claiming mechanismis robust enough for supporting various kinds of claiming (the NUMApatches that were sent proved that there is room for improvement here).Ideally, the claiming system should be done on the Scheduler itself (byhaving a distributed transactional model for concurrent schedulingrequests to multiple schedulers).

These to me are the critical items for the split. Yes there are lotsof other areas/interfaces inside Nova that should be cleaned up butthe goal here is to split out the scheduler, not to refactor everyinterface inside Nova.

Indeed, we need to keep in track the objective to split the Scheduler assoon as possible. That's why I'm proposing a strategy of updating statsto the Scheduler by passing Nova objects (ComputeNode here) to theScheduler using the update_resource_stats() method previously given andby adding the instance_claim and resize_claim methods to the ComputeNodeobject itself, so that a select_destinations() call from the conductorcould issue a call to each ComputeNode it elects for making sure itwould have enough resources for it.The current RT claims would be kept for backwards compatibility purposeand doublechecking until we consider the new workflow good enough forremoving these claims.

The above strategy is coming from a braindump but estimated as thelowest common denominator for all the necessary changes. I'm reallyconcerned by any temptative of doing some big-bangs here which couldleave us to loose the focus on splitting the scheduler.

Feel free to correct this email but I really want to make sure we allare in agreement on the same thing so that we can actually getsomething done.

Yeah, I assume that's quite frustrating because the design phase is notyet ended. IMHO I think we need to find some sort of online meetup fordiscussing all the bits of the split, as we can't wait for the KiloSummit to be here. The Gantt meetings are obviously not the right placefor discussing design and implementation so we need to promote onlinetools for doing such work.

We need ideas, we need volunteers, so feel free to raise your hand (andyour voice) if you reader, you're willing to work on that effort.


-Sylvain

--

Don Dugger

"Censeo Toto nos in Kansa esse decisse." - D. Gale

Ph: 303/443-3786



_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [gantt] Scheduler cleanup - what did we agree to

Reply via email to