subject:"\[openstack\-dev\] \[ironic\] \[nova\] Ironic virt driver resources reporting"

Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

2017-01-04 Thread Vladyslav Drok

On Wed, Jan 4, 2017 at 5:45 PM, Vladyslav Drok  wrote:

> Thanks all for replies!
>
> On Tue, Jan 3, 2017 at 5:16 PM, Jay Faulkner  wrote:
>
>> Hey Vdrok, some comments inline.
>>
>> > On Dec 30, 2016, at 8:40 AM, Vladyslav Drok  wrote:
>> >
>> > Hi all!
>> >
>> > There is a long standing problem of resources reporting in ironic virt
>> driver. It's described in a couple of bugs I've found - [0], [1]. Switching
>> to placement API will make things better, but still there are some problems
>> there. For example, there are cases when ironic needs to say "this node is
>> not available", and it reports the vcpus=memory_mb=local_gb as 0 in this
>> case. Placement API does not allow 0s, so in [2] it is proposed to remove
>> inventory records in this case.
>> >
>> > But the whole logic here [3] seems not that obvious to me, so I'd like
>> to discuss when do we need to report 0s to placement API. I'm thinking
>> about the following (copy-pasted from my comment on [2]):
>> >
>> >   • If there is an instance_uuid on the node, no matter what
>> provision/power state it's in, consider the resources as used. In case it's
>> an orphan, an admin will need to take some manual action anyway.
>>
>> This won’t work, because of https://bugs.launchpad.net/nova/+bug/1503453
>> — basically the Nova resource tracker checks, decides we’re lying about it
>> being used for an instance because Nova’s records don’t show we do, and it
>> reads the capacity to the pool.
>>
>
> Aha, I see, after looking at code a bit more and discussing with JayF,
> that happens during update_available_resource here
> https://github.com/openstack/nova/blob/372452a1f703115310ea3400f9f636
> 829759b80f/nova/compute/resource_tracker.py#L921-L934, where "instances"
> are all instances assigned to current host and node. Though, I don't really
> like the fact that _used amount is greater than the 
> amount that is possible here - https://github.com/openstack/nova/blob/
> 372452a1f703115310ea3400f9f636829759b80f/nova/virt/ironic/
> driver.py#L301-L326, as it makes the free values reported to be negative
> (I can't find the place where they are set to 0 if negative). Maybe we
> could at least report 0 for both available and used amounts?
>

OK, I must be blind, it is set to 0 if negative here
https://github.com/openstack/nova/blob/372452a1f703115310ea3400f9f636829759b80f/nova/compute/resource_tracker.py#L938-L939,
so it should be fine, apart from the fact that used value will be greater
than available.


>
>
>>
>> Generally I agree with Jay Pipes’ comments — we should have available
>> resources for nodes that can be scheduled to, used resources for nodes with
>> with a nova instance, and report no resources whatsoever for nodes in an
>> unschedulable state, such as cleaning, enroll, etc.
>>
>> -
>> Jay Faulkner
>> OSIC
>>
>> >   • If there is no instance_uuid and a node is in cleaning/clean
>> wait after tear down, it is a part of normal node lifecycle, report all
>> resources as used. This means we need a way to determine if it's a manual
>> or automated clean.
>> >   • If there is no instance_uuid, and a node:
>> >   • has a bad power state or
>> >   • is in maintenance
>> >   • or actually in any other case, consider it unavailable,
>> report available resources = used resources = 0. Provision state does not
>> matter in this logic, all cases that we wanted to take into account are
>> described in the first two bullets.
>> >
>> > Any thoughts?
>> >
>> > [0]. https://bugs.launchpad.net/nova/+bug/1402658
>> > [1]. https://bugs.launchpad.net/nova/+bug/1637449
>> > [2]. https://review.openstack.org/414214
>> > [3]. https://github.com/openstack/nova/blob/1506c36b4446f6ba1487a
>> 2d68e4b23cb3fca44cb/nova/virt/ironic/driver.py#L262
>> >
>> > Happy holidays to everyone!
>> > -Vlad
>> > 
>> __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe: openstack-dev-requ...@lists.op
>> enstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

2017-01-04 Thread Vladyslav Drok

Thanks all for replies!

On Tue, Jan 3, 2017 at 5:16 PM, Jay Faulkner  wrote:

> Hey Vdrok, some comments inline.
>
> > On Dec 30, 2016, at 8:40 AM, Vladyslav Drok  wrote:
> >
> > Hi all!
> >
> > There is a long standing problem of resources reporting in ironic virt
> driver. It's described in a couple of bugs I've found - [0], [1]. Switching
> to placement API will make things better, but still there are some problems
> there. For example, there are cases when ironic needs to say "this node is
> not available", and it reports the vcpus=memory_mb=local_gb as 0 in this
> case. Placement API does not allow 0s, so in [2] it is proposed to remove
> inventory records in this case.
> >
> > But the whole logic here [3] seems not that obvious to me, so I'd like
> to discuss when do we need to report 0s to placement API. I'm thinking
> about the following (copy-pasted from my comment on [2]):
> >
> >   • If there is an instance_uuid on the node, no matter what
> provision/power state it's in, consider the resources as used. In case it's
> an orphan, an admin will need to take some manual action anyway.
>
> This won’t work, because of https://bugs.launchpad.net/nova/+bug/1503453
> — basically the Nova resource tracker checks, decides we’re lying about it
> being used for an instance because Nova’s records don’t show we do, and it
> reads the capacity to the pool.
>

Aha, I see, after looking at code a bit more and discussing with JayF, that
happens during update_available_resource here
https://github.com/openstack/nova/blob/372452a1f703115310ea3400f9f636829759b80f/nova/compute/resource_tracker.py#L921-L934,
where "instances" are all instances assigned to current host and node.
Though, I don't really like the fact that _used amount is greater
than the  amount that is possible here -
https://github.com/openstack/nova/blob/372452a1f703115310ea3400f9f636829759b80f/nova/virt/ironic/driver.py#L301-L326,
as it makes the free values reported to be negative (I can't find the place
where they are set to 0 if negative). Maybe we could at least report 0 for
both available and used amounts?


>
> Generally I agree with Jay Pipes’ comments — we should have available
> resources for nodes that can be scheduled to, used resources for nodes with
> with a nova instance, and report no resources whatsoever for nodes in an
> unschedulable state, such as cleaning, enroll, etc.
>
> -
> Jay Faulkner
> OSIC
>
> >   • If there is no instance_uuid and a node is in cleaning/clean
> wait after tear down, it is a part of normal node lifecycle, report all
> resources as used. This means we need a way to determine if it's a manual
> or automated clean.
> >   • If there is no instance_uuid, and a node:
> >   • has a bad power state or
> >   • is in maintenance
> >   • or actually in any other case, consider it unavailable,
> report available resources = used resources = 0. Provision state does not
> matter in this logic, all cases that we wanted to take into account are
> described in the first two bullets.
> >
> > Any thoughts?
> >
> > [0]. https://bugs.launchpad.net/nova/+bug/1402658
> > [1]. https://bugs.launchpad.net/nova/+bug/1637449
> > [2]. https://review.openstack.org/414214
> > [3]. https://github.com/openstack/nova/blob/
> 1506c36b4446f6ba1487a2d68e4b23cb3fca44cb/nova/virt/ironic/driver.py#L262
> >
> > Happy holidays to everyone!
> > -Vlad
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

2017-01-03 Thread Jay Faulkner

Hey Vdrok, some comments inline.

> On Dec 30, 2016, at 8:40 AM, Vladyslav Drok  wrote:
> 
> Hi all!
> 
> There is a long standing problem of resources reporting in ironic virt 
> driver. It's described in a couple of bugs I've found - [0], [1]. Switching 
> to placement API will make things better, but still there are some problems 
> there. For example, there are cases when ironic needs to say "this node is 
> not available", and it reports the vcpus=memory_mb=local_gb as 0 in this 
> case. Placement API does not allow 0s, so in [2] it is proposed to remove 
> inventory records in this case.
> 
> But the whole logic here [3] seems not that obvious to me, so I'd like to 
> discuss when do we need to report 0s to placement API. I'm thinking about the 
> following (copy-pasted from my comment on [2]):
> 
>   • If there is an instance_uuid on the node, no matter what 
> provision/power state it's in, consider the resources as used. In case it's 
> an orphan, an admin will need to take some manual action anyway.

This won’t work, because of https://bugs.launchpad.net/nova/+bug/1503453 — 
basically the Nova resource tracker checks, decides we’re lying about it being 
used for an instance because Nova’s records don’t show we do, and it reads the 
capacity to the pool.

Generally I agree with Jay Pipes’ comments — we should have available resources 
for nodes that can be scheduled to, used resources for nodes with with a nova 
instance, and report no resources whatsoever for nodes in an unschedulable 
state, such as cleaning, enroll, etc.

-
Jay Faulkner
OSIC

>   • If there is no instance_uuid and a node is in cleaning/clean wait 
> after tear down, it is a part of normal node lifecycle, report all resources 
> as used. This means we need a way to determine if it's a manual or automated 
> clean.
>   • If there is no instance_uuid, and a node:
>   • has a bad power state or
>   • is in maintenance
>   • or actually in any other case, consider it unavailable, 
> report available resources = used resources = 0. Provision state does not 
> matter in this logic, all cases that we wanted to take into account are 
> described in the first two bullets.
> 
> Any thoughts?
> 
> [0]. https://bugs.launchpad.net/nova/+bug/1402658
> [1]. https://bugs.launchpad.net/nova/+bug/1637449
> [2]. https://review.openstack.org/414214
> [3]. 
> https://github.com/openstack/nova/blob/1506c36b4446f6ba1487a2d68e4b23cb3fca44cb/nova/virt/ironic/driver.py#L262
> 
> Happy holidays to everyone!
> -Vlad
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

2017-01-03 Thread Pavlo Shchelokovskyy

Hi,

a comment about 'report as full' vs 'remove from inventory'

On Mon, Jan 2, 2017 at 7:53 PM, Jay Pipes  wrote:

> Great questions, Vlad. Comments inline.
>
> On 12/30/2016 11:40 AM, Vladyslav Drok wrote:
>
>> Hi all!
>>
>> There is a long standing problem of resources reporting in ironic virt
>> driver.
>>
>
> That would be an understatement :)
>
> > It's described in a couple of bugs I've found - [0], [1].
>
>> Switching to placement API will make things better, but still there are
>> some problems there. For example, there are cases when ironic needs to
>> say "this node is not available", and it reports the
>> vcpus=memory_mb=local_gb as 0 in this case. Placement API does not allow
>> 0s, so in [2] it is proposed to remove inventory records in this case.
>>
>
> Correct.
>
> But the whole logic here [3] seems not that obvious to me, so I'd like
>> to discuss when do we need to report 0s to placement API. I'm thinking
>> about the following (copy-pasted from my comment on [2]):
>>
>>   * If there is an instance_uuid on the node, no matter what
>> provision/power state it's in, consider the resources as used. In
>> case it's an orphan, an admin will need to take some manual action
>> anyway.
>>
>
> The single source of truth for Ironic instances is the Ironic database. If
> Ironic's database says that a node is consumed by an instance, then it
> should be considered by Nova to be consumed.
>

Well, it is nova that marks the instance as consumed by setting the
instance_uuid field on the node :) The question is when is the right time
to remove it... (see my next comment below). Currently it is removed before
teardown/undeploy, so the node in CLEANING state already has no
instance_uuid on itself.


>   * If there is no instance_uuid and a node is in cleaning/clean wait
>> after tear down, it is a part of normal node lifecycle, report all
>> resources as used. This means we need a way to determine if it's a
>> manual or automated clean.
>>
>
> I don't see a need to determine manual vs. automated clean. The node is in
> a clean state; therefore the inventory of resources on that node are not
> available for a consumer of those resources to consume. So, the inventory
> should be deleted in Nova. This inventory should be re-added if and when
> the node is in a state that a consumer can grab it.
>
>
There is a difference between "removing the resource from available" vs
"declaring the resource fully consumed" - the end result for scheduling is
the same (those resources are not being scheduled to), but I am worrying
about any cloud-wide monitoring mechanisms that may start alerting about
hypervisors disappearing / total cloud capacity going down even though
everything is operating normally.

IMO during the happy path for nova instance on ironic node ( node available
-> nova does deploy -> node active -> nova does undeploy -> node is
available, with all intermediate *ing / *_wait states) the node should be
reported as "fully consumed by instance" as cleaning in this case is a
standard part of healthy node lifecycle. Only when something out of happy
path happens (maintenance, deploy or cleaning error) should the node be
removed from overall cloud capacity. And this is why we might have to
differentiate between automated cleaning (happy path) vs manual cleaning
(usually some manual recovery from error). Due to this I'd also suggest to
remove the instance_uud from ironic node in the end of cleaning, should
make clearer in which stage is the node right now.


>   * If there is no instance_uuid, and a node:
>>   o has a bad power state or
>>   o is in maintenance
>>   o or actually in any other case, consider it unavailable, report
>> available resources = used resources = 0. Provision state does
>> not matter in this logic, all cases that we wanted to take into
>> account are described in the first two bullets.
>>
>
> Correct. If there is no instance UUID for the node, that means there's no
> allocation for it. If there's no allocation for the node, its inventory can
> and should be deleted if the node cannot be consumed by an instance (for
> whatever reason).
>
> Best,
> -jay
>
> Any thoughts?
>>
>> [0]. https://bugs.launchpad.net/nova/+bug/1402658
>> [1]. https://bugs.launchpad.net/nova/+bug/1637449
>> [2]. https://review.openstack.org/414214
>> [3]. https://github.com/openstack/nova/blob/1506c36b4446f6ba1487a
>> 2d68e4b23cb3fca44cb/nova/virt/ironic/driver.py#L262
>>
>> Happy holidays to everyone!
>> -Vlad
>>
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
> __
> OpenStack Development Mailing List (not for

Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

2017-01-02 Thread Jay Pipes

Great questions, Vlad. Comments inline.

On 12/30/2016 11:40 AM, Vladyslav Drok wrote:

Hi all!

There is a long standing problem of resources reporting in ironic virt
driver.

That would be an understatement :)

> It's described in a couple of bugs I've found - [0], [1].

Switching to placement API will make things better, but still there are
some problems there. For example, there are cases when ironic needs to
say "this node is not available", and it reports the
vcpus=memory_mb=local_gb as 0 in this case. Placement API does not allow
0s, so in [2] it is proposed to remove inventory records in this case.

Correct.

But the whole logic here [3] seems not that obvious to me, so I'd like
to discuss when do we need to report 0s to placement API. I'm thinking
about the following (copy-pasted from my comment on [2]):

* If there is an instance_uuid on the node, no matter what
provision/power state it's in, consider the resources as used. In
case it's an orphan, an admin will need to take some manual action
anyway.

The single source of truth for Ironic instances is the Ironic database.
If Ironic's database says that a node is consumed by an instance, then
it should be considered by Nova to be consumed.

* If there is no instance_uuid and a node is in cleaning/clean wait
after tear down, it is a part of normal node lifecycle, report all
resources as used. This means we need a way to determine if it's a
manual or automated clean.

I don't see a need to determine manual vs. automated clean. The node is
in a clean state; therefore the inventory of resources on that node are
not available for a consumer of those resources to consume. So, the
inventory should be deleted in Nova. This inventory should be re-added
if and when the node is in a state that a consumer can grab it.

* If there is no instance_uuid, and a node:
o has a bad power state or
o is in maintenance
o or actually in any other case, consider it unavailable, report
available resources = used resources = 0. Provision state does
not matter in this logic, all cases that we wanted to take into
account are described in the first two bullets.

Correct. If there is no instance UUID for the node, that means there's
no allocation for it. If there's no allocation for the node, its
inventory can and should be deleted if the node cannot be consumed by an
instance (for whatever reason).

Best,
-jay

Any thoughts?

[0]. https://bugs.launchpad.net/nova/+bug/1402658
[1]. https://bugs.launchpad.net/nova/+bug/1637449
[2]. https://review.openstack.org/414214
[3].
https://github.com/openstack/nova/blob/1506c36b4446f6ba1487a2d68e4b23cb3fca44cb/nova/virt/ironic/driver.py#L262

Happy holidays to everyone!
-Vlad

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

2016-12-30 Thread Vladyslav Drok

Hi all!

There is a long standing problem of resources reporting in ironic virt
driver. It's described in a couple of bugs I've found - [0], [1]. Switching
to placement API will make things better, but still there are some problems
there. For example, there are cases when ironic needs to say "this node is
not available", and it reports the vcpus=memory_mb=local_gb as 0 in this
case. Placement API does not allow 0s, so in [2] it is proposed to remove
inventory records in this case.

But the whole logic here [3] seems not that obvious to me, so I'd like to
discuss when do we need to report 0s to placement API. I'm thinking about
the following (copy-pasted from my comment on [2]):

- If there is an instance_uuid on the node, no matter what
provision/power state it's in, consider the resources as used. In case it's
an orphan, an admin will need to take some manual action anyway.
- If there is no instance_uuid and a node is in cleaning/clean wait
after tear down, it is a part of normal node lifecycle, report all
resources as used. This means we need a way to determine if it's a manual
or automated clean.
- If there is no instance_uuid, and a node:
- has a bad power state or
- is in maintenance
- or actually in any other case, consider it unavailable, report
available resources = used resources = 0. Provision state does not matter
in this logic, all cases that we wanted to take into account are
described
in the first two bullets.

Any thoughts?

Happy holidays to everyone!
-Vlad
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

[openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

6 matches

Site Navigation

Mail list logo

Footer information