[openstack-dev] [heat] Repeating stack-delete many times

2015-02-10 Thread Kairat Kushaev
Hi all,
During the analysis of the following bug:
https://bugs.launchpad.net/heat/+bug/1418878
i figured out that orchestration engine doesn't work properly in some cases.
The case is the following:
trying to delete the same stack with resources n times in series.
It might happen if the stack deleting takes much time and a user is sending
the second delete request again.
Orchestration engine behavior is the following:
1) When first stack-delete command comes to heat service
it acquires the stack lock and sends delete request for resources
to other clients.
Unfortunately, the command does not start to delete resources from heat db.
2) At that time second stack-delete command for the same stack
comes to heat engine. It steals the stack lock, waits 0.2 (hard-coded
constant!)
sec to allow previous stack-delete command finish the operations (of
course,
the first didn't manage to finish deleting on time). After that engine
service starts
the deleting again:
 - Request resources from heat DB (They exist!)
 - Send requests for delete to other clients (They do not exist because
of
point 1).
Finally, we have stack in DELETE_FAILED state because the clients raise
exceptions during stack delete.
I have some proposals how to fix it:
p1) Make waiting time (0.2 sec) configurable. It allows to finish
stack-delete ops
before the second command starts deleting. From my point of view, it is
just
workaround because different stacks (and operations) took different time.
p2) Try to deny lock stealing if the current thread executes deleting. As
an option,
we can wait for the other thread if stack is deleting but it seems that it
is not possible
to analyze with the current solution.
p3) Just leave it as it is. IMO, the last solution.
Do you have any other proposals how to manage such kind of cases?
Perhaps there is exists more proper solution.

Thank You,
Kairat Kushaev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] Repeating stack-delete many times

2015-02-10 Thread Kairat Kushaev
Sorry for flood,
i forgot p4:
Prohibit stack deletion if the current stack state, status = (DELETE, IN
PROGRESS).
Raise not supported exception in heat engine. It is possible because stack
state
will be updated before deleting.

On Tue, Feb 10, 2015 at 2:04 PM, Kairat Kushaev kkush...@mirantis.com
wrote:

 Hi all,
 During the analysis of the following bug:
 https://bugs.launchpad.net/heat/+bug/1418878
 i figured out that orchestration engine doesn't work properly in some
 cases.
 The case is the following:
 trying to delete the same stack with resources n times in series.
 It might happen if the stack deleting takes much time and a user is sending
 the second delete request again.
 Orchestration engine behavior is the following:
 1) When first stack-delete command comes to heat service
 it acquires the stack lock and sends delete request for resources
 to other clients.
 Unfortunately, the command does not start to delete resources from heat
 db.
 2) At that time second stack-delete command for the same stack
 comes to heat engine. It steals the stack lock, waits 0.2 (hard-coded
 constant!)
 sec to allow previous stack-delete command finish the operations (of
 course,
 the first didn't manage to finish deleting on time). After that engine
 service starts
 the deleting again:
  - Request resources from heat DB (They exist!)
  - Send requests for delete to other clients (They do not exist
 because of
 point 1).
 Finally, we have stack in DELETE_FAILED state because the clients raise
 exceptions during stack delete.
 I have some proposals how to fix it:
 p1) Make waiting time (0.2 sec) configurable. It allows to finish
 stack-delete ops
 before the second command starts deleting. From my point of view, it is
 just
 workaround because different stacks (and operations) took different time.
 p2) Try to deny lock stealing if the current thread executes deleting. As
 an option,
 we can wait for the other thread if stack is deleting but it seems that it
 is not possible
 to analyze with the current solution.
 p3) Just leave it as it is. IMO, the last solution.
 Do you have any other proposals how to manage such kind of cases?
 Perhaps there is exists more proper solution.

 Thank You,
 Kairat Kushaev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] Repeating stack-delete many times

2015-02-10 Thread Kairat Kushaev
Thanks for explanation, Steven.
Will try to figure out why it is not working in nova.

On Tue, Feb 10, 2015 at 4:04 PM, Steven Hardy sha...@redhat.com wrote:

 On Tue, Feb 10, 2015 at 03:04:39PM +0400, Kairat Kushaev wrote:
 Hi all,
 During the analysis of the following bug:
 https://bugs.launchpad.net/heat/+bug/1418878
 i figured out that orchestration engine doesn't work properly in some
 cases.
 The case is the following:A
 trying to delete the same stack with resources n times in series.
 It might happen if the stack deleting takes much time and a user is
 sending
 the second delete request again.
 Orchestration engine behavior is the following:
 1) When first stack-delete command comes to heat service
 it acquires the stack lock and sends delete request for resources
 to other clients.
 Unfortunately, the command does not start to delete resources from
 heat
 db.A
 2) At that time second stack-delete command for the same stack
 comes to heat engine. It steals the stack lock, waits 0.2 (hard-coded
 constant!)A
 sec to allow previous stack-delete command finish the operations (of
 course,A
 the first didn't manage to finish deleting on time). After that engine
 service startsA
 the deleting again:
 A  A  A - Request resources from heat DB (They exist!)
 A  A  A - Send requests for delete to other clients (They do not exist
 because ofA
 A  A  A  A  point 1).

 This is expected, and the reason for the following error path in most
 resource handle_delete paths is to ignore any do not exist errors:

   self.client_plugin().ignore_not_found(e)

 Finally, we have stack in DELETE_FAILED state because the clients
 raise
 exceptions during stack delete.

 This is the bug, the exception which is raised isn't getting ignored by the
 nova client plugin, which by default only ignores NotFound exceptions:


 https://github.com/openstack/heat/blob/master/heat/engine/clients/os/nova.py#L85

 In this case, I think the problem is you're getting a Conflict exception
 when attempting to re-delete the NovaFloatingIpAssociation:


 https://github.com/openstack/heat/blob/master/heat/engine/resources/nova_floatingip.py#L148

 So, I think this is probably a bug specific to NovaFloatingIpAssociation
 rather than a problem we need to fix accross all resources?

 I'd probably suggest we either add another except clause which catches (and
 ignores) this situation, or look at if novaclient is raising the wrong
 exception type, as NotFound would appear to be a saner error than
 Conflict when trying to delete a non-existent association?

 Steve

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] Repeating stack-delete many times

2015-02-10 Thread Steven Hardy
On Tue, Feb 10, 2015 at 03:04:39PM +0400, Kairat Kushaev wrote:
Hi all,
During the analysis of the following bug:
https://bugs.launchpad.net/heat/+bug/1418878
i figured out that orchestration engine doesn't work properly in some
cases.
The case is the following:A 
trying to delete the same stack with resources n times in series.
It might happen if the stack deleting takes much time and a user is
sending
the second delete request again.
Orchestration engine behavior is the following:
1) When first stack-delete command comes to heat service
it acquires the stack lock and sends delete request for resources
to other clients.
Unfortunately, the command does not start to delete resources from heat
db.A 
2) At that time second stack-delete command for the same stack
comes to heat engine. It steals the stack lock, waits 0.2 (hard-coded
constant!)A 
sec to allow previous stack-delete command finish the operations (of
course,A 
the first didn't manage to finish deleting on time). After that engine
service startsA 
the deleting again:
A  A  A - Request resources from heat DB (They exist!)
A  A  A - Send requests for delete to other clients (They do not exist
because ofA 
A  A  A  A  point 1).

This is expected, and the reason for the following error path in most
resource handle_delete paths is to ignore any do not exist errors:

  self.client_plugin().ignore_not_found(e)

Finally, we have stack in DELETE_FAILED state because the clients raise
exceptions during stack delete.

This is the bug, the exception which is raised isn't getting ignored by the
nova client plugin, which by default only ignores NotFound exceptions:

https://github.com/openstack/heat/blob/master/heat/engine/clients/os/nova.py#L85

In this case, I think the problem is you're getting a Conflict exception
when attempting to re-delete the NovaFloatingIpAssociation:

https://github.com/openstack/heat/blob/master/heat/engine/resources/nova_floatingip.py#L148

So, I think this is probably a bug specific to NovaFloatingIpAssociation
rather than a problem we need to fix accross all resources?

I'd probably suggest we either add another except clause which catches (and
ignores) this situation, or look at if novaclient is raising the wrong
exception type, as NotFound would appear to be a saner error than
Conflict when trying to delete a non-existent association?

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev