Re: [openstack-dev] [Ironic] Manual scheduling nodes in maintenance mode

2014-03-18 Thread Devananda van der Veen
On Tue, Mar 18, 2014 at 12:24 PM, Robert Collins
wrote:

> On 15 March 2014 13:07, Devananda van der Veen 
> wrote:
> > +1 to the idea.
> >
> > However, I think we should discuss whether the rescue interface is the
> > appropriate path. It's initial intention was to tie into Nova's rescue
> > interface, allowing a user whose instance is non-responsive to boot into
> a
> > recovery mode and access the data stored within their instance. I think
> > there are two different use-cases here:
> >
> > Case A: a user of Nova who somehow breaks their instance, and wants to
> boot
> > into a "rescue" or "recovery" mode, preserving instance data. This is
> useful
> > if, eg, they lost network access or broke their grub config.
> >
> > Case B: an operator of the baremetal cloud whose hardware may be
> > malfunctioning, who wishes to hide that hardware from users of Case A
> while
> > they diagnose and fix the underlying problem.
> >
> > As I see it, Nova's rescue API (and by extension, the same API in
> Ironic) is
> > intended for A, but not for B.  TripleO's use case includes both of
> these,
> > and may be conflating them.
>
> I agree.
>
> > I believe Case A is addressed by the planned driver.rescue interface. As
> for
> > Case B, I think the solution is to use different tenants and move the
> node
> > between them. This is a more complex problem -- Ironic does not model
> > tenants, and AFAIK Nova doesn't reserve unallocated compute resources on
> a
> > per-tenant basis.
> >
> > That said, I think we will need a way to indicate "this bare metal node
> > belongs to that tenant", regardless of the rescue use case.
>
> I'm not sure Ironic should be involved in scheduling (and giving a
> node to a tenant is a scheduling problem).
>
>
Ironic does not need to make decisions about scheduling for nodes to be
associated to specific tenants. It merely needs to store the tenant_id and
expose it to a (potentially new) filter scheduler that matches on it in a
way that prevents users of Nova from explicitly choosing machines that
"belong" to other tenants. I think the only work needed for this is a new
scheduler filter, a few lines in the Nova driver to expose info to it, and
for the operator to stash a tenant ID in Ironic using the existing API to
update the node.properties field. I don't envision that Nova should ever
change the node->tenant mapping.


> If I may sketch an alternative - when a node is put into maintenance
> mode, keep publishing it to the scheduler, but add an extra spec to it
> that won't match any request automatically.
>
> Then 'deploy X to a maintenance node machine' is simple nove boot with
> a scheduler hint to explicitly choose that machine, and all the
> regular machinery will take place.
>

That should also work :)

I don't see any reason why we can't do both.

-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] Manual scheduling nodes in maintenance mode

2014-03-18 Thread Robert Collins
On 15 March 2014 13:07, Devananda van der Veen  wrote:
> +1 to the idea.
>
> However, I think we should discuss whether the rescue interface is the
> appropriate path. It's initial intention was to tie into Nova's rescue
> interface, allowing a user whose instance is non-responsive to boot into a
> recovery mode and access the data stored within their instance. I think
> there are two different use-cases here:
>
> Case A: a user of Nova who somehow breaks their instance, and wants to boot
> into a "rescue" or "recovery" mode, preserving instance data. This is useful
> if, eg, they lost network access or broke their grub config.
>
> Case B: an operator of the baremetal cloud whose hardware may be
> malfunctioning, who wishes to hide that hardware from users of Case A while
> they diagnose and fix the underlying problem.
>
> As I see it, Nova's rescue API (and by extension, the same API in Ironic) is
> intended for A, but not for B.  TripleO's use case includes both of these,
> and may be conflating them.

I agree.

> I believe Case A is addressed by the planned driver.rescue interface. As for
> Case B, I think the solution is to use different tenants and move the node
> between them. This is a more complex problem -- Ironic does not model
> tenants, and AFAIK Nova doesn't reserve unallocated compute resources on a
> per-tenant basis.
>
> That said, I think we will need a way to indicate "this bare metal node
> belongs to that tenant", regardless of the rescue use case.

I'm not sure Ironic should be involved in scheduling (and giving a
node to a tenant is a scheduling problem).

If I may sketch an alternative - when a node is put into maintenance
mode, keep publishing it to the scheduler, but add an extra spec to it
that won't match any request automatically.

Then 'deploy X to a maintenance node machine' is simple nove boot with
a scheduler hint to explicitly choose that machine, and all the
regular machinery will take place.

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] Manual scheduling nodes in maintenance mode

2014-03-14 Thread Devananda van der Veen
+1 to the idea.

However, I think we should discuss whether the rescue interface is the
appropriate path. It's initial intention was to tie into Nova's rescue
interface, allowing a user whose instance is non-responsive to boot into a
recovery mode and access the data stored within their instance. I think
there are two different use-cases here:

Case A: a user of Nova who somehow breaks their instance, and wants to boot
into a "rescue" or "recovery" mode, preserving instance data. This is
useful if, eg, they lost network access or broke their grub config.

Case B: an operator of the baremetal cloud whose hardware may be
malfunctioning, who wishes to hide that hardware from users of Case A while
they diagnose and fix the underlying problem.

As I see it, Nova's rescue API (and by extension, the same API in Ironic)
is intended for A, but not for B.  TripleO's use case includes both of
these, and may be conflating them.

I believe Case A is addressed by the planned driver.rescue interface. As
for Case B, I think the solution is to use different tenants and move the
node between them. This is a more complex problem -- Ironic does not model
tenants, and AFAIK Nova doesn't reserve unallocated compute resources on a
per-tenant basis.

That said, I think we will need a way to indicate "this bare metal node
belongs to that tenant", regardless of the rescue use case.

-Deva



On Fri, Mar 14, 2014 at 5:01 AM, Lucas Alvares Gomes
wrote:

> On Wed, Mar 12, 2014 at 8:07 PM, Chris Jones  wrote:
>
>>
>> Hey
>>
>> I wanted to throw out an idea that came to me while I was working on
>> diagnosing some hardware issues in the Tripleo CD rack at the sprint last
>> week.
>>
>> Specifically, if a particular node has been dropped from automatic
>> scheduling by the operator, I think it would be super useful to be able to
>> still manually schedule the node. Examples might be that someone is
>> diagnosing a hardware issue and wants to boot an image that has all their
>> favourite diagnostic tools in it, or they might be booting an image they
>> use for updating firmwares, etc (frankly, just being able to boot a
>> generic, unmodified host OS on a node can be super useful if you're trying
>> to crash cart the machine for something hardware related).
>>
>> Any thoughts? :)
>>
>
> +1 I like the idea and think it's quite useful.
>
> Drivers in Ironic already expose a rescue interface[1] (which I don't
> think we had put much thoughts into it yet) perhaps the PXE driver might
> implement something similar to what you want to do here?
>
> [1]
> https://github.com/openstack/ironic/blob/master/ironic/drivers/base.py#L60
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] Manual scheduling nodes in maintenance mode

2014-03-14 Thread Lucas Alvares Gomes
On Wed, Mar 12, 2014 at 8:07 PM, Chris Jones  wrote:

>
> Hey
>
> I wanted to throw out an idea that came to me while I was working on
> diagnosing some hardware issues in the Tripleo CD rack at the sprint last
> week.
>
> Specifically, if a particular node has been dropped from automatic
> scheduling by the operator, I think it would be super useful to be able to
> still manually schedule the node. Examples might be that someone is
> diagnosing a hardware issue and wants to boot an image that has all their
> favourite diagnostic tools in it, or they might be booting an image they
> use for updating firmwares, etc (frankly, just being able to boot a
> generic, unmodified host OS on a node can be super useful if you're trying
> to crash cart the machine for something hardware related).
>
> Any thoughts? :)
>

+1 I like the idea and think it's quite useful.

Drivers in Ironic already expose a rescue interface[1] (which I don't think
we had put much thoughts into it yet) perhaps the PXE driver might
implement something similar to what you want to do here?

[1]
https://github.com/openstack/ironic/blob/master/ironic/drivers/base.py#L60
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] Manual scheduling nodes in maintenance mode

2014-03-12 Thread Clint Byrum
Excerpts from Chris Jones's message of 2014-03-12 13:07:21 -0700:
> Hey
> 
> I wanted to throw out an idea that came to me while I was working on
> diagnosing some hardware issues in the Tripleo CD rack at the sprint last
> week.
> 
> Specifically, if a particular node has been dropped from automatic
> scheduling by the operator, I think it would be super useful to be able to
> still manually schedule the node. Examples might be that someone is
> diagnosing a hardware issue and wants to boot an image that has all their
> favourite diagnostic tools in it, or they might be booting an image they
> use for updating firmwares, etc (frankly, just being able to boot a
> generic, unmodified host OS on a node can be super useful if you're trying
> to crash cart the machine for something hardware related).
> 
> Any thoughts? :)
> 

+1 from me, as I've been in the exact same boat (perhaps the same piece
of hardware even. ;)

I imagine it as a nova scheduler hint that finds its way into Ironic
eventually.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev