Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

2017-01-04 Thread Vladyslav Drok
On Wed, Jan 4, 2017 at 5:45 PM, Vladyslav Drok  wrote:

> Thanks all for replies!
>
> On Tue, Jan 3, 2017 at 5:16 PM, Jay Faulkner  wrote:
>
>> Hey Vdrok, some comments inline.
>>
>> > On Dec 30, 2016, at 8:40 AM, Vladyslav Drok  wrote:
>> >
>> > Hi all!
>> >
>> > There is a long standing problem of resources reporting in ironic virt
>> driver. It's described in a couple of bugs I've found - [0], [1]. Switching
>> to placement API will make things better, but still there are some problems
>> there. For example, there are cases when ironic needs to say "this node is
>> not available", and it reports the vcpus=memory_mb=local_gb as 0 in this
>> case. Placement API does not allow 0s, so in [2] it is proposed to remove
>> inventory records in this case.
>> >
>> > But the whole logic here [3] seems not that obvious to me, so I'd like
>> to discuss when do we need to report 0s to placement API. I'm thinking
>> about the following (copy-pasted from my comment on [2]):
>> >
>> >   • If there is an instance_uuid on the node, no matter what
>> provision/power state it's in, consider the resources as used. In case it's
>> an orphan, an admin will need to take some manual action anyway.
>>
>> This won’t work, because of https://bugs.launchpad.net/nova/+bug/1503453
>> — basically the Nova resource tracker checks, decides we’re lying about it
>> being used for an instance because Nova’s records don’t show we do, and it
>> reads the capacity to the pool.
>>
>
> Aha, I see, after looking at code a bit more and discussing with JayF,
> that happens during update_available_resource here
> https://github.com/openstack/nova/blob/372452a1f703115310ea3400f9f636
> 829759b80f/nova/compute/resource_tracker.py#L921-L934, where "instances"
> are all instances assigned to current host and node. Though, I don't really
> like the fact that _used amount is greater than the 
> amount that is possible here - https://github.com/openstack/nova/blob/
> 372452a1f703115310ea3400f9f636829759b80f/nova/virt/ironic/
> driver.py#L301-L326, as it makes the free values reported to be negative
> (I can't find the place where they are set to 0 if negative). Maybe we
> could at least report 0 for both available and used amounts?
>

OK, I must be blind, it is set to 0 if negative here
https://github.com/openstack/nova/blob/372452a1f703115310ea3400f9f636829759b80f/nova/compute/resource_tracker.py#L938-L939,
so it should be fine, apart from the fact that used value will be greater
than available.


>
>
>>
>> Generally I agree with Jay Pipes’ comments — we should have available
>> resources for nodes that can be scheduled to, used resources for nodes with
>> with a nova instance, and report no resources whatsoever for nodes in an
>> unschedulable state, such as cleaning, enroll, etc.
>>
>> -
>> Jay Faulkner
>> OSIC
>>
>> >   • If there is no instance_uuid and a node is in cleaning/clean
>> wait after tear down, it is a part of normal node lifecycle, report all
>> resources as used. This means we need a way to determine if it's a manual
>> or automated clean.
>> >   • If there is no instance_uuid, and a node:
>> >   • has a bad power state or
>> >   • is in maintenance
>> >   • or actually in any other case, consider it unavailable,
>> report available resources = used resources = 0. Provision state does not
>> matter in this logic, all cases that we wanted to take into account are
>> described in the first two bullets.
>> >
>> > Any thoughts?
>> >
>> > [0]. https://bugs.launchpad.net/nova/+bug/1402658
>> > [1]. https://bugs.launchpad.net/nova/+bug/1637449
>> > [2]. https://review.openstack.org/414214
>> > [3]. https://github.com/openstack/nova/blob/1506c36b4446f6ba1487a
>> 2d68e4b23cb3fca44cb/nova/virt/ironic/driver.py#L262
>> >
>> > Happy holidays to everyone!
>> > -Vlad
>> > 
>> __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe: openstack-dev-requ...@lists.op
>> enstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

2017-01-04 Thread Vladyslav Drok
Thanks all for replies!

On Tue, Jan 3, 2017 at 5:16 PM, Jay Faulkner  wrote:

> Hey Vdrok, some comments inline.
>
> > On Dec 30, 2016, at 8:40 AM, Vladyslav Drok  wrote:
> >
> > Hi all!
> >
> > There is a long standing problem of resources reporting in ironic virt
> driver. It's described in a couple of bugs I've found - [0], [1]. Switching
> to placement API will make things better, but still there are some problems
> there. For example, there are cases when ironic needs to say "this node is
> not available", and it reports the vcpus=memory_mb=local_gb as 0 in this
> case. Placement API does not allow 0s, so in [2] it is proposed to remove
> inventory records in this case.
> >
> > But the whole logic here [3] seems not that obvious to me, so I'd like
> to discuss when do we need to report 0s to placement API. I'm thinking
> about the following (copy-pasted from my comment on [2]):
> >
> >   • If there is an instance_uuid on the node, no matter what
> provision/power state it's in, consider the resources as used. In case it's
> an orphan, an admin will need to take some manual action anyway.
>
> This won’t work, because of https://bugs.launchpad.net/nova/+bug/1503453
> — basically the Nova resource tracker checks, decides we’re lying about it
> being used for an instance because Nova’s records don’t show we do, and it
> reads the capacity to the pool.
>

Aha, I see, after looking at code a bit more and discussing with JayF, that
happens during update_available_resource here
https://github.com/openstack/nova/blob/372452a1f703115310ea3400f9f636829759b80f/nova/compute/resource_tracker.py#L921-L934,
where "instances" are all instances assigned to current host and node.
Though, I don't really like the fact that _used amount is greater
than the  amount that is possible here -
https://github.com/openstack/nova/blob/372452a1f703115310ea3400f9f636829759b80f/nova/virt/ironic/driver.py#L301-L326,
as it makes the free values reported to be negative (I can't find the place
where they are set to 0 if negative). Maybe we could at least report 0 for
both available and used amounts?


>
> Generally I agree with Jay Pipes’ comments — we should have available
> resources for nodes that can be scheduled to, used resources for nodes with
> with a nova instance, and report no resources whatsoever for nodes in an
> unschedulable state, such as cleaning, enroll, etc.
>
> -
> Jay Faulkner
> OSIC
>
> >   • If there is no instance_uuid and a node is in cleaning/clean
> wait after tear down, it is a part of normal node lifecycle, report all
> resources as used. This means we need a way to determine if it's a manual
> or automated clean.
> >   • If there is no instance_uuid, and a node:
> >   • has a bad power state or
> >   • is in maintenance
> >   • or actually in any other case, consider it unavailable,
> report available resources = used resources = 0. Provision state does not
> matter in this logic, all cases that we wanted to take into account are
> described in the first two bullets.
> >
> > Any thoughts?
> >
> > [0]. https://bugs.launchpad.net/nova/+bug/1402658
> > [1]. https://bugs.launchpad.net/nova/+bug/1637449
> > [2]. https://review.openstack.org/414214
> > [3]. https://github.com/openstack/nova/blob/
> 1506c36b4446f6ba1487a2d68e4b23cb3fca44cb/nova/virt/ironic/driver.py#L262
> >
> > Happy holidays to everyone!
> > -Vlad
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

2017-01-03 Thread Jay Faulkner
Hey Vdrok, some comments inline.

> On Dec 30, 2016, at 8:40 AM, Vladyslav Drok  wrote:
> 
> Hi all!
> 
> There is a long standing problem of resources reporting in ironic virt 
> driver. It's described in a couple of bugs I've found - [0], [1]. Switching 
> to placement API will make things better, but still there are some problems 
> there. For example, there are cases when ironic needs to say "this node is 
> not available", and it reports the vcpus=memory_mb=local_gb as 0 in this 
> case. Placement API does not allow 0s, so in [2] it is proposed to remove 
> inventory records in this case.
> 
> But the whole logic here [3] seems not that obvious to me, so I'd like to 
> discuss when do we need to report 0s to placement API. I'm thinking about the 
> following (copy-pasted from my comment on [2]):
> 
>   • If there is an instance_uuid on the node, no matter what 
> provision/power state it's in, consider the resources as used. In case it's 
> an orphan, an admin will need to take some manual action anyway.

This won’t work, because of https://bugs.launchpad.net/nova/+bug/1503453 — 
basically the Nova resource tracker checks, decides we’re lying about it being 
used for an instance because Nova’s records don’t show we do, and it reads the 
capacity to the pool.

Generally I agree with Jay Pipes’ comments — we should have available resources 
for nodes that can be scheduled to, used resources for nodes with with a nova 
instance, and report no resources whatsoever for nodes in an unschedulable 
state, such as cleaning, enroll, etc.

-
Jay Faulkner
OSIC

>   • If there is no instance_uuid and a node is in cleaning/clean wait 
> after tear down, it is a part of normal node lifecycle, report all resources 
> as used. This means we need a way to determine if it's a manual or automated 
> clean.
>   • If there is no instance_uuid, and a node:
>   • has a bad power state or
>   • is in maintenance
>   • or actually in any other case, consider it unavailable, 
> report available resources = used resources = 0. Provision state does not 
> matter in this logic, all cases that we wanted to take into account are 
> described in the first two bullets.
> 
> Any thoughts?
> 
> [0]. https://bugs.launchpad.net/nova/+bug/1402658
> [1]. https://bugs.launchpad.net/nova/+bug/1637449
> [2]. https://review.openstack.org/414214
> [3]. 
> https://github.com/openstack/nova/blob/1506c36b4446f6ba1487a2d68e4b23cb3fca44cb/nova/virt/ironic/driver.py#L262
> 
> Happy holidays to everyone!
> -Vlad
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

2017-01-03 Thread Pavlo Shchelokovskyy
Hi,

a comment about 'report as full' vs 'remove from inventory'

On Mon, Jan 2, 2017 at 7:53 PM, Jay Pipes  wrote:

> Great questions, Vlad. Comments inline.
>
> On 12/30/2016 11:40 AM, Vladyslav Drok wrote:
>
>> Hi all!
>>
>> There is a long standing problem of resources reporting in ironic virt
>> driver.
>>
>
> That would be an understatement :)
>
> > It's described in a couple of bugs I've found - [0], [1].
>
>> Switching to placement API will make things better, but still there are
>> some problems there. For example, there are cases when ironic needs to
>> say "this node is not available", and it reports the
>> vcpus=memory_mb=local_gb as 0 in this case. Placement API does not allow
>> 0s, so in [2] it is proposed to remove inventory records in this case.
>>
>
> Correct.
>
> But the whole logic here [3] seems not that obvious to me, so I'd like
>> to discuss when do we need to report 0s to placement API. I'm thinking
>> about the following (copy-pasted from my comment on [2]):
>>
>>   * If there is an instance_uuid on the node, no matter what
>> provision/power state it's in, consider the resources as used. In
>> case it's an orphan, an admin will need to take some manual action
>> anyway.
>>
>
> The single source of truth for Ironic instances is the Ironic database. If
> Ironic's database says that a node is consumed by an instance, then it
> should be considered by Nova to be consumed.
>

Well, it is nova that marks the instance as consumed by setting the
instance_uuid field on the node :) The question is when is the right time
to remove it... (see my next comment below). Currently it is removed before
teardown/undeploy, so the node in CLEANING state already has no
instance_uuid on itself.


>   * If there is no instance_uuid and a node is in cleaning/clean wait
>> after tear down, it is a part of normal node lifecycle, report all
>> resources as used. This means we need a way to determine if it's a
>> manual or automated clean.
>>
>
> I don't see a need to determine manual vs. automated clean. The node is in
> a clean state; therefore the inventory of resources on that node are not
> available for a consumer of those resources to consume. So, the inventory
> should be deleted in Nova. This inventory should be re-added if and when
> the node is in a state that a consumer can grab it.
>
>
There is a difference between "removing the resource from available" vs
"declaring the resource fully consumed" - the end result for scheduling is
the same (those resources are not being scheduled to), but I am worrying
about any cloud-wide monitoring mechanisms that may start alerting about
hypervisors disappearing / total cloud capacity going down even though
everything is operating normally.

IMO during the happy path for nova instance on ironic node ( node available
-> nova does deploy -> node active -> nova does undeploy -> node is
available, with all intermediate *ing / *_wait states) the node should be
reported as "fully consumed by instance" as cleaning in this case is a
standard part of healthy node lifecycle. Only when something out of happy
path happens (maintenance, deploy or cleaning error) should the node be
removed from overall cloud capacity. And this is why we might have to
differentiate between automated cleaning (happy path) vs manual cleaning
(usually some manual recovery from error). Due to this I'd also suggest to
remove the instance_uud from ironic node in the end of cleaning, should
make clearer in which stage is the node right now.


>   * If there is no instance_uuid, and a node:
>>   o has a bad power state or
>>   o is in maintenance
>>   o or actually in any other case, consider it unavailable, report
>> available resources = used resources = 0. Provision state does
>> not matter in this logic, all cases that we wanted to take into
>> account are described in the first two bullets.
>>
>
> Correct. If there is no instance UUID for the node, that means there's no
> allocation for it. If there's no allocation for the node, its inventory can
> and should be deleted if the node cannot be consumed by an instance (for
> whatever reason).
>
> Best,
> -jay
>
> Any thoughts?
>>
>> [0]. https://bugs.launchpad.net/nova/+bug/1402658
>> [1]. https://bugs.launchpad.net/nova/+bug/1637449
>> [2]. https://review.openstack.org/414214
>> [3]. https://github.com/openstack/nova/blob/1506c36b4446f6ba1487a
>> 2d68e4b23cb3fca44cb/nova/virt/ironic/driver.py#L262
>>
>> Happy holidays to everyone!
>> -Vlad
>>
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
> __
> OpenStack Development Mailing List (not for 

Re: [openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

2017-01-02 Thread Jay Pipes

Great questions, Vlad. Comments inline.

On 12/30/2016 11:40 AM, Vladyslav Drok wrote:

Hi all!

There is a long standing problem of resources reporting in ironic virt
driver.


That would be an understatement :)

> It's described in a couple of bugs I've found - [0], [1].

Switching to placement API will make things better, but still there are
some problems there. For example, there are cases when ironic needs to
say "this node is not available", and it reports the
vcpus=memory_mb=local_gb as 0 in this case. Placement API does not allow
0s, so in [2] it is proposed to remove inventory records in this case.


Correct.


But the whole logic here [3] seems not that obvious to me, so I'd like
to discuss when do we need to report 0s to placement API. I'm thinking
about the following (copy-pasted from my comment on [2]):

  * If there is an instance_uuid on the node, no matter what
provision/power state it's in, consider the resources as used. In
case it's an orphan, an admin will need to take some manual action
anyway.


The single source of truth for Ironic instances is the Ironic database. 
If Ironic's database says that a node is consumed by an instance, then 
it should be considered by Nova to be consumed.



  * If there is no instance_uuid and a node is in cleaning/clean wait
after tear down, it is a part of normal node lifecycle, report all
resources as used. This means we need a way to determine if it's a
manual or automated clean.


I don't see a need to determine manual vs. automated clean. The node is 
in a clean state; therefore the inventory of resources on that node are 
not available for a consumer of those resources to consume. So, the 
inventory should be deleted in Nova. This inventory should be re-added 
if and when the node is in a state that a consumer can grab it.



  * If there is no instance_uuid, and a node:
  o has a bad power state or
  o is in maintenance
  o or actually in any other case, consider it unavailable, report
available resources = used resources = 0. Provision state does
not matter in this logic, all cases that we wanted to take into
account are described in the first two bullets.


Correct. If there is no instance UUID for the node, that means there's 
no allocation for it. If there's no allocation for the node, its 
inventory can and should be deleted if the node cannot be consumed by an 
instance (for whatever reason).


Best,
-jay


Any thoughts?

[0]. https://bugs.launchpad.net/nova/+bug/1402658
[1]. https://bugs.launchpad.net/nova/+bug/1637449
[2]. https://review.openstack.org/414214
[3]. 
https://github.com/openstack/nova/blob/1506c36b4446f6ba1487a2d68e4b23cb3fca44cb/nova/virt/ironic/driver.py#L262

Happy holidays to everyone!
-Vlad


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

2016-12-30 Thread Vladyslav Drok
Hi all!

There is a long standing problem of resources reporting in ironic virt
driver. It's described in a couple of bugs I've found - [0], [1]. Switching
to placement API will make things better, but still there are some problems
there. For example, there are cases when ironic needs to say "this node is
not available", and it reports the vcpus=memory_mb=local_gb as 0 in this
case. Placement API does not allow 0s, so in [2] it is proposed to remove
inventory records in this case.

But the whole logic here [3] seems not that obvious to me, so I'd like to
discuss when do we need to report 0s to placement API. I'm thinking about
the following (copy-pasted from my comment on [2]):


   - If there is an instance_uuid on the node, no matter what
   provision/power state it's in, consider the resources as used. In case it's
   an orphan, an admin will need to take some manual action anyway.
   - If there is no instance_uuid and a node is in cleaning/clean wait
   after tear down, it is a part of normal node lifecycle, report all
   resources as used. This means we need a way to determine if it's a manual
   or automated clean.
   - If there is no instance_uuid, and a node:
  - has a bad power state or
  - is in maintenance
  - or actually in any other case, consider it unavailable, report
  available resources = used resources = 0. Provision state does not matter
  in this logic, all cases that we wanted to take into account are
described
  in the first two bullets.


Any thoughts?

[0]. https://bugs.launchpad.net/nova/+bug/1402658
[1]. https://bugs.launchpad.net/nova/+bug/1637449
[2]. https://review.openstack.org/414214
[3].
https://github.com/openstack/nova/blob/1506c36b4446f6ba1487a2d68e4b23cb3fca44cb/nova/virt/ironic/driver.py#L262

Happy holidays to everyone!
-Vlad
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev