Re: [openstack-dev] [Ironic] Manual scheduling nodes in maintenance mode
On Tue, Mar 18, 2014 at 12:24 PM, Robert Collins wrote: > On 15 March 2014 13:07, Devananda van der Veen > wrote: > > +1 to the idea. > > > > However, I think we should discuss whether the rescue interface is the > > appropriate path. It's initial intention was to tie into Nova's rescue > > interface, allowing a user whose instance is non-responsive to boot into > a > > recovery mode and access the data stored within their instance. I think > > there are two different use-cases here: > > > > Case A: a user of Nova who somehow breaks their instance, and wants to > boot > > into a "rescue" or "recovery" mode, preserving instance data. This is > useful > > if, eg, they lost network access or broke their grub config. > > > > Case B: an operator of the baremetal cloud whose hardware may be > > malfunctioning, who wishes to hide that hardware from users of Case A > while > > they diagnose and fix the underlying problem. > > > > As I see it, Nova's rescue API (and by extension, the same API in > Ironic) is > > intended for A, but not for B. TripleO's use case includes both of > these, > > and may be conflating them. > > I agree. > > > I believe Case A is addressed by the planned driver.rescue interface. As > for > > Case B, I think the solution is to use different tenants and move the > node > > between them. This is a more complex problem -- Ironic does not model > > tenants, and AFAIK Nova doesn't reserve unallocated compute resources on > a > > per-tenant basis. > > > > That said, I think we will need a way to indicate "this bare metal node > > belongs to that tenant", regardless of the rescue use case. > > I'm not sure Ironic should be involved in scheduling (and giving a > node to a tenant is a scheduling problem). > > Ironic does not need to make decisions about scheduling for nodes to be associated to specific tenants. It merely needs to store the tenant_id and expose it to a (potentially new) filter scheduler that matches on it in a way that prevents users of Nova from explicitly choosing machines that "belong" to other tenants. I think the only work needed for this is a new scheduler filter, a few lines in the Nova driver to expose info to it, and for the operator to stash a tenant ID in Ironic using the existing API to update the node.properties field. I don't envision that Nova should ever change the node->tenant mapping. > If I may sketch an alternative - when a node is put into maintenance > mode, keep publishing it to the scheduler, but add an extra spec to it > that won't match any request automatically. > > Then 'deploy X to a maintenance node machine' is simple nove boot with > a scheduler hint to explicitly choose that machine, and all the > regular machinery will take place. > That should also work :) I don't see any reason why we can't do both. -Deva ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] Manual scheduling nodes in maintenance mode
On 15 March 2014 13:07, Devananda van der Veen wrote: > +1 to the idea. > > However, I think we should discuss whether the rescue interface is the > appropriate path. It's initial intention was to tie into Nova's rescue > interface, allowing a user whose instance is non-responsive to boot into a > recovery mode and access the data stored within their instance. I think > there are two different use-cases here: > > Case A: a user of Nova who somehow breaks their instance, and wants to boot > into a "rescue" or "recovery" mode, preserving instance data. This is useful > if, eg, they lost network access or broke their grub config. > > Case B: an operator of the baremetal cloud whose hardware may be > malfunctioning, who wishes to hide that hardware from users of Case A while > they diagnose and fix the underlying problem. > > As I see it, Nova's rescue API (and by extension, the same API in Ironic) is > intended for A, but not for B. TripleO's use case includes both of these, > and may be conflating them. I agree. > I believe Case A is addressed by the planned driver.rescue interface. As for > Case B, I think the solution is to use different tenants and move the node > between them. This is a more complex problem -- Ironic does not model > tenants, and AFAIK Nova doesn't reserve unallocated compute resources on a > per-tenant basis. > > That said, I think we will need a way to indicate "this bare metal node > belongs to that tenant", regardless of the rescue use case. I'm not sure Ironic should be involved in scheduling (and giving a node to a tenant is a scheduling problem). If I may sketch an alternative - when a node is put into maintenance mode, keep publishing it to the scheduler, but add an extra spec to it that won't match any request automatically. Then 'deploy X to a maintenance node machine' is simple nove boot with a scheduler hint to explicitly choose that machine, and all the regular machinery will take place. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] Manual scheduling nodes in maintenance mode
+1 to the idea. However, I think we should discuss whether the rescue interface is the appropriate path. It's initial intention was to tie into Nova's rescue interface, allowing a user whose instance is non-responsive to boot into a recovery mode and access the data stored within their instance. I think there are two different use-cases here: Case A: a user of Nova who somehow breaks their instance, and wants to boot into a "rescue" or "recovery" mode, preserving instance data. This is useful if, eg, they lost network access or broke their grub config. Case B: an operator of the baremetal cloud whose hardware may be malfunctioning, who wishes to hide that hardware from users of Case A while they diagnose and fix the underlying problem. As I see it, Nova's rescue API (and by extension, the same API in Ironic) is intended for A, but not for B. TripleO's use case includes both of these, and may be conflating them. I believe Case A is addressed by the planned driver.rescue interface. As for Case B, I think the solution is to use different tenants and move the node between them. This is a more complex problem -- Ironic does not model tenants, and AFAIK Nova doesn't reserve unallocated compute resources on a per-tenant basis. That said, I think we will need a way to indicate "this bare metal node belongs to that tenant", regardless of the rescue use case. -Deva On Fri, Mar 14, 2014 at 5:01 AM, Lucas Alvares Gomes wrote: > On Wed, Mar 12, 2014 at 8:07 PM, Chris Jones wrote: > >> >> Hey >> >> I wanted to throw out an idea that came to me while I was working on >> diagnosing some hardware issues in the Tripleo CD rack at the sprint last >> week. >> >> Specifically, if a particular node has been dropped from automatic >> scheduling by the operator, I think it would be super useful to be able to >> still manually schedule the node. Examples might be that someone is >> diagnosing a hardware issue and wants to boot an image that has all their >> favourite diagnostic tools in it, or they might be booting an image they >> use for updating firmwares, etc (frankly, just being able to boot a >> generic, unmodified host OS on a node can be super useful if you're trying >> to crash cart the machine for something hardware related). >> >> Any thoughts? :) >> > > +1 I like the idea and think it's quite useful. > > Drivers in Ironic already expose a rescue interface[1] (which I don't > think we had put much thoughts into it yet) perhaps the PXE driver might > implement something similar to what you want to do here? > > [1] > https://github.com/openstack/ironic/blob/master/ironic/drivers/base.py#L60 > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] Manual scheduling nodes in maintenance mode
On Wed, Mar 12, 2014 at 8:07 PM, Chris Jones wrote: > > Hey > > I wanted to throw out an idea that came to me while I was working on > diagnosing some hardware issues in the Tripleo CD rack at the sprint last > week. > > Specifically, if a particular node has been dropped from automatic > scheduling by the operator, I think it would be super useful to be able to > still manually schedule the node. Examples might be that someone is > diagnosing a hardware issue and wants to boot an image that has all their > favourite diagnostic tools in it, or they might be booting an image they > use for updating firmwares, etc (frankly, just being able to boot a > generic, unmodified host OS on a node can be super useful if you're trying > to crash cart the machine for something hardware related). > > Any thoughts? :) > +1 I like the idea and think it's quite useful. Drivers in Ironic already expose a rescue interface[1] (which I don't think we had put much thoughts into it yet) perhaps the PXE driver might implement something similar to what you want to do here? [1] https://github.com/openstack/ironic/blob/master/ironic/drivers/base.py#L60 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] Manual scheduling nodes in maintenance mode
Excerpts from Chris Jones's message of 2014-03-12 13:07:21 -0700: > Hey > > I wanted to throw out an idea that came to me while I was working on > diagnosing some hardware issues in the Tripleo CD rack at the sprint last > week. > > Specifically, if a particular node has been dropped from automatic > scheduling by the operator, I think it would be super useful to be able to > still manually schedule the node. Examples might be that someone is > diagnosing a hardware issue and wants to boot an image that has all their > favourite diagnostic tools in it, or they might be booting an image they > use for updating firmwares, etc (frankly, just being able to boot a > generic, unmodified host OS on a node can be super useful if you're trying > to crash cart the machine for something hardware related). > > Any thoughts? :) > +1 from me, as I've been in the exact same boat (perhaps the same piece of hardware even. ;) I imagine it as a nova scheduler hint that finds its way into Ironic eventually. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev