Hi all, The nova midcycle went very well from an ironic perspective. Jay Pipes and Andrew Laski had a new (better!) proposal for how we schedule ironic resources, which I'd like to detail for folks. We explicitly split this off from the topic of running multiple nova-compute daemons for Ironic, because they are separate topics. But, some of us also discussed that piece and I think we have a good path forward. We also talked about how we plan Ironic driver work in Nova, and I'll detail some of that discussion a bit.
Scheduling to Ironic resources ============================== Jay and Andrew presented their vision for this, which depends heavily on the work in progress on the placement engine, specifically "resource providers" and "dynamic resource classes".[0] There's a few distinct pieces to talk about here. Others that were in the room, please do correct me if I'm wrong anywhere. :) First, the general goal for resource providers (which may be a compute host or something else) is that by the Newton release, information is being reported into the placement database. This means that when someone upgrades to Ocata, the data will already be there for the scheduler to use, and there will be no blip in service while this data is collected. As a note, a resource provider is a thing that provides some quantitative amount of a "resource class". This resource class may be things like vCPUs, RAM, disk, etc. The "dynamic resource class" spec[0] allows resource classes to be created dynamically via the placement API. For Ironic, each node will be a resource provider. Each node will provide 0 or 1 of a given resource class, depending if the node is schedulable or not (e.g. maintenance mode, enroll state, etc). For the Newton cycle, we want to be putting this resource provider data for Ironic into the placement database. To do this, the resource tracker will be modified such that an Ironic node reported back will be put in the compute_nodes table (as before), and also the resource provider table. Since each resource provider needs a resource class, Nova needs to be able to find the resource class in the dict passed back to the resource tracker. As such, I've proposed a spec to Ironic[1] and some code changes to Ironic, python-ironicclient, and Nova[2] to pass this information back to the resource tracker. This is done by putting a field on the node object in Ironic called `resource_class` (surprise!). I promise I tried to think of a better name for this and completely failed. In Ocata, we want to begin scheduling Ironic instances to resource providers. To accomplish this, Nova flavors will be able to "require" or "prefer" a given quantity of some resource class. For an Ironic flavor, this will (almost?) always be a required quantity of 1 of the Ironic resource classes. Note that we didn't discuss what happens in Ocata, if Ironic nodes don't have a resource class set and/or flavors do not require some Ironic resource class. I have some thoughts on this, but they aren't solidified enough to write here without chatting with the Nova folks to make sure I'm not crazy. So, between Newton and Ocata, operators will need to set the resource class for each node in Ironic, and require the resource classes for each Ironic flavor in Nova. It's very important we get the work mentioned for Newton done in Newton. If it doesn't land until Ocata, operators will get a nice surprise when the compute daemon starts up, and no resources are available until they've populated the field in Ironic (because it didn't exist in the Newton version of Ironic) and the resource tracker takes its sweet time picking up that field from the Ironic nodes. Also of note: in Ocata, a placement API will be available for Ironic to talk directly to. This means that when a state changes in Ironic (e.g. maintenance mode is turned on/off, cleaning->available, etc), we can immediately tell the placement API that the resource (node) is available to schedule to (or not). This will help eliminate many of the scheduling races we have between Nova and Ironic. [0] https://review.openstack.org/#/c/312696/ [1] https://review.openstack.org/#/c/345040 [2] https://review.openstack.org/#/q/topic:bug/1604916 Multiple compute daemons ======================== This was interesting. Someone (Dan Smith?) proposed doing consistent hashing of the Ironic nodes between each compute daemon, such that each daemon manages some subset of the Ironic nodes. This would likely use the same code we already use in Ironic to decide which conductor manages which nodes (we'd put that code into oslo). Once an instance is placed on a compute daemon, the node that instance is on would always be managed by that daemon, until the instance is deleted. This is because Nova has strong assumptions that an instance is always managed by the same compute daemon unless it is migrated. We could write code to "re-home" an instance to another compute if the hash ring changes, but that would be down the road a bit. So, a given compute daemon would manage (nodes with instances managed by that daemon) + (some subset of nodes decided by the hash ring). This would mean that we could scale compute daemons horizontally very easily, and if one fails, automatically re-balance so that no nodes are left behind. Only existing instances would not be able to be managed (until we wrote some re-homing code). I'm going to play with a POC soon - I welcome any help if others want to play with this as well. :) I seem to remember this being proposed in the past and being shot down, but nobody present could remember why. If someone does recall, speak up. We tried to shoot as many holes in this as possible and couldn't penetrate it. Planning Ironic virt driver work ================================ I planned to bring this up at some point, but it ended up coming up organically during a discussion on Neutron and live-migrate. We essentially decided that when there's a significant amount of Nova changes (for some definition of "significant"), we should do a couple things: 1) Make sure Nova team buys into the architecture. This could be in the form of a backlog spec being approved, or even just some +1s from nova-specs core members on a spec for the current cycle. 2) Wait to approve the Nova side of the work until the Ironic side is done (or close to done). This should help ensure that the Nova team can plan accordingly for the work coming into the Ironic virt driver, without bumping it to the next cycle when the Ironic side doesn't get finished before the non-priority feature freeze. As always, questions/comments/concerns on the above is welcome. If there are none, let's go ahead and get to work on the scheduling bits in the first section. Thanks for reading my novel. // jim __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
