Re: [openstack-dev] Nova quota statistics counting issue

2016-04-15 Thread Matt Riedemann



On 4/14/2016 3:07 PM, Andrew Laski wrote:

On Wed, Apr 13, 2016, at 12:27 PM, Dmitry Stepanenko wrote:

Hi Team,
I worked on nova quota statistics issue
(https://bugs.launchpad.net/nova/+bug/1284424) happenning when nova-*
processes are restarted during removing instances and was able to
reproduce it. For repro I used devstack and started nova-api and
nova-compute in separate screen windows. For killing them I used
ctrl+c. As I found this issue happened if nova-* processes are killed
after instance was deleted but right before quota commit procedure
finishes.
We discussed these results with Markus Zoeller and decided that even
though killing nova processes is a bit exotic event, this still should
be fixed because quotas counting affects billing and very important
for us.

+1. This is very important to get right. And while killing Nova
processes is exotic during normal operation it could happen for upgrades
and that should not cause quota issues.

So, we need to introduce some mechanism that will prevent us from
reaching inconsistent states in terms of quotas. In other words, this
mechanism should work in such a way that both instance create/remove
operation and quota usage recount operation happened or not happened
together.

There's been some discussion around this, and there are other ML threads
somewhat discussing it in the context of moving quota enforcement into a
centralized service/library. There are a couple of approaches that could
be taken for tackling quotas, but a larger issue is that we have no good
way of knowing if some change helps the situation. What we need before
making any changes is a functional test that reproduces the issue.
Once that is in place I would love to see the removal of the
quota_usages table and reservations and have quota be based on actual
usage represented in the instances table. But there are a lot of other
viewpoints and I think work in this area is going to have to start
making small incremental improvements.

Any ideas how to do that properly?
Kind regards,
Dmitry

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



I've tried to start that here [1] but it needs work. I have a messier 
local version too that was (I think) reproducing a failure, but because 
it's a weird race condition mess, it's kind of hard to test and know 
when to assert the thing and stop the test.


Maybe I'll just push up the latest WIP of what I have locally and then 
someone else can take it over if they want.


[1] https://review.openstack.org/#/c/293800/

--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Nova quota statistics counting issue

2016-04-14 Thread Andrew Laski
 
 
 
On Wed, Apr 13, 2016, at 12:27 PM, Dmitry Stepanenko wrote:
> Hi Team,
> I worked on nova quota statistics issue
> (https://bugs.launchpad.net/nova/+bug/1284424) happenning when nova-*
> processes are restarted during removing instances and was able to
> reproduce it. For repro I used devstack and started nova-api and nova-
> compute in separate screen windows. For killing them I used ctrl+c. As
> I found this issue happened if nova-* processes are killed after
> instance was deleted but right before quota commit procedure finishes.
> We discussed these results with Markus Zoeller and decided that even
> though killing nova processes is a bit exotic event, this still
> should be fixed because quotas counting affects billing and very
> important for us.
 
+1. This is very important to get right. And while killing Nova
processes is exotic during normal operation it could happen for upgrades
and that should not cause quota issues.
 
> So, we need to introduce some mechanism that will prevent us from
> reaching inconsistent states in terms of quotas. In other words, this
> mechanism should work in such a way that both instance create/remove
> operation and quota usage recount operation happened or not happened
> together.
 
There's been some discussion around this, and there are other ML threads
somewhat discussing it in the context of moving quota enforcement into a
centralized service/library. There are a couple of approaches that could
be taken for tackling quotas, but a larger issue is that we have no good
way of knowing if some change helps the situation. What we need before
making any changes  is a functional test that reproduces the issue.
 
Once that is in place I would love to see the removal of the
quota_usages table and reservations and have quota be based on actual
usage represented in the instances table. But there are a lot of other
viewpoints and I think work in this area is going to have to start
making small incremental improvements.
 
 
> Any ideas how to do that properly?
> Kind regards,
> Dmitry
> -
> 
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Nova quota statistics counting issue

2016-04-14 Thread Salvatore Orlando
For what is worth neutron employs "resource trackers" which conceptually do
something similar to nova quota usage statistics.
Before starting any transaction that can potentially change usage for a
given resource, the quota enforcement mechanism checks for a "dirty" marker
on the resource tracker.
If that marker is present, usage data for that resource are calculated from
the DB table for the resource. If not, current usage is employed for quota
enforcement and the "dirty" flag is set.

This means that if the process dies in the middle of a transaction, the
next transaction will rebuild the correct usage count from the DB.

Salvatore


On 14 April 2016 at 14:08, Timofei Durakov  wrote:

> Hi,
>
> I think it would be ok to store persistently quota details on compute
> side, as was discussed during mitaka mid-cycle[1] for migrations[2]. So if
> compute service fails we could restore state and update quota after compute
> restart.
>
> Timofey
>
> [1] - https://etherpad.openstack.org/p/mitaka-nova-priorities-tracking
> [2] - https://review.openstack.org/#/c/291161/5/nova/compute/background.py
>
>
>
>
> On Wed, Apr 13, 2016 at 7:27 PM, Dmitry Stepanenko <
> dstepane...@mirantis.com> wrote:
>
>> Hi Team,
>>
>> I worked on nova quota statistics issue (
>> https://bugs.launchpad.net/nova/+bug/1284424) happenning when nova-*
>> processes are restarted during removing instances and was able to reproduce
>> it. For repro I used devstack and started nova-api and nova-compute in
>> separate screen windows. For killing them I used ctrl+c. As I found this
>> issue happened if nova-* processes are killed after instance was deleted
>> but right before quota commit procedure finishes.
>>
>> We discussed these results with Markus Zoeller and decided that even
>> though killing nova processes is a bit exotic event, this still should be
>> fixed because quotas counting affects billing and very important for us.
>>
>> So, we need to introduce some mechanism that will prevent us from
>> reaching inconsistent states in terms of quotas. In other words, this
>> mechanism should work in such a way that both instance create/remove
>> operation and quota usage recount operation happened or not happened
>> together.
>>
>> Any ideas how to do that properly?
>>
>> Kind regards,
>> Dmitry
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Nova quota statistics counting issue

2016-04-14 Thread Timofei Durakov
Hi,

I think it would be ok to store persistently quota details on compute side,
as was discussed during mitaka mid-cycle[1] for migrations[2]. So if
compute service fails we could restore state and update quota after compute
restart.

Timofey

[1] - https://etherpad.openstack.org/p/mitaka-nova-priorities-tracking
[2] - https://review.openstack.org/#/c/291161/5/nova/compute/background.py




On Wed, Apr 13, 2016 at 7:27 PM, Dmitry Stepanenko  wrote:

> Hi Team,
>
> I worked on nova quota statistics issue (
> https://bugs.launchpad.net/nova/+bug/1284424) happenning when nova-*
> processes are restarted during removing instances and was able to reproduce
> it. For repro I used devstack and started nova-api and nova-compute in
> separate screen windows. For killing them I used ctrl+c. As I found this
> issue happened if nova-* processes are killed after instance was deleted
> but right before quota commit procedure finishes.
>
> We discussed these results with Markus Zoeller and decided that even
> though killing nova processes is a bit exotic event, this still should be
> fixed because quotas counting affects billing and very important for us.
>
> So, we need to introduce some mechanism that will prevent us from reaching
> inconsistent states in terms of quotas. In other words, this mechanism
> should work in such a way that both instance create/remove operation and
> quota usage recount operation happened or not happened together.
>
> Any ideas how to do that properly?
>
> Kind regards,
> Dmitry
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev