[openstack-dev] [heat] Repeating stack-delete many times
Hi all, During the analysis of the following bug: https://bugs.launchpad.net/heat/+bug/1418878 i figured out that orchestration engine doesn't work properly in some cases. The case is the following: trying to delete the same stack with resources n times in series. It might happen if the stack deleting takes much time and a user is sending the second delete request again. Orchestration engine behavior is the following: 1) When first stack-delete command comes to heat service it acquires the stack lock and sends delete request for resources to other clients. Unfortunately, the command does not start to delete resources from heat db. 2) At that time second stack-delete command for the same stack comes to heat engine. It steals the stack lock, waits 0.2 (hard-coded constant!) sec to allow previous stack-delete command finish the operations (of course, the first didn't manage to finish deleting on time). After that engine service starts the deleting again: - Request resources from heat DB (They exist!) - Send requests for delete to other clients (They do not exist because of point 1). Finally, we have stack in DELETE_FAILED state because the clients raise exceptions during stack delete. I have some proposals how to fix it: p1) Make waiting time (0.2 sec) configurable. It allows to finish stack-delete ops before the second command starts deleting. From my point of view, it is just workaround because different stacks (and operations) took different time. p2) Try to deny lock stealing if the current thread executes deleting. As an option, we can wait for the other thread if stack is deleting but it seems that it is not possible to analyze with the current solution. p3) Just leave it as it is. IMO, the last solution. Do you have any other proposals how to manage such kind of cases? Perhaps there is exists more proper solution. Thank You, Kairat Kushaev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] Repeating stack-delete many times
Sorry for flood, i forgot p4: Prohibit stack deletion if the current stack state, status = (DELETE, IN PROGRESS). Raise not supported exception in heat engine. It is possible because stack state will be updated before deleting. On Tue, Feb 10, 2015 at 2:04 PM, Kairat Kushaev kkush...@mirantis.com wrote: Hi all, During the analysis of the following bug: https://bugs.launchpad.net/heat/+bug/1418878 i figured out that orchestration engine doesn't work properly in some cases. The case is the following: trying to delete the same stack with resources n times in series. It might happen if the stack deleting takes much time and a user is sending the second delete request again. Orchestration engine behavior is the following: 1) When first stack-delete command comes to heat service it acquires the stack lock and sends delete request for resources to other clients. Unfortunately, the command does not start to delete resources from heat db. 2) At that time second stack-delete command for the same stack comes to heat engine. It steals the stack lock, waits 0.2 (hard-coded constant!) sec to allow previous stack-delete command finish the operations (of course, the first didn't manage to finish deleting on time). After that engine service starts the deleting again: - Request resources from heat DB (They exist!) - Send requests for delete to other clients (They do not exist because of point 1). Finally, we have stack in DELETE_FAILED state because the clients raise exceptions during stack delete. I have some proposals how to fix it: p1) Make waiting time (0.2 sec) configurable. It allows to finish stack-delete ops before the second command starts deleting. From my point of view, it is just workaround because different stacks (and operations) took different time. p2) Try to deny lock stealing if the current thread executes deleting. As an option, we can wait for the other thread if stack is deleting but it seems that it is not possible to analyze with the current solution. p3) Just leave it as it is. IMO, the last solution. Do you have any other proposals how to manage such kind of cases? Perhaps there is exists more proper solution. Thank You, Kairat Kushaev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] Repeating stack-delete many times
Thanks for explanation, Steven. Will try to figure out why it is not working in nova. On Tue, Feb 10, 2015 at 4:04 PM, Steven Hardy sha...@redhat.com wrote: On Tue, Feb 10, 2015 at 03:04:39PM +0400, Kairat Kushaev wrote: Hi all, During the analysis of the following bug: https://bugs.launchpad.net/heat/+bug/1418878 i figured out that orchestration engine doesn't work properly in some cases. The case is the following:A trying to delete the same stack with resources n times in series. It might happen if the stack deleting takes much time and a user is sending the second delete request again. Orchestration engine behavior is the following: 1) When first stack-delete command comes to heat service it acquires the stack lock and sends delete request for resources to other clients. Unfortunately, the command does not start to delete resources from heat db.A 2) At that time second stack-delete command for the same stack comes to heat engine. It steals the stack lock, waits 0.2 (hard-coded constant!)A sec to allow previous stack-delete command finish the operations (of course,A the first didn't manage to finish deleting on time). After that engine service startsA the deleting again: A A A - Request resources from heat DB (They exist!) A A A - Send requests for delete to other clients (They do not exist because ofA A A A A point 1). This is expected, and the reason for the following error path in most resource handle_delete paths is to ignore any do not exist errors: self.client_plugin().ignore_not_found(e) Finally, we have stack in DELETE_FAILED state because the clients raise exceptions during stack delete. This is the bug, the exception which is raised isn't getting ignored by the nova client plugin, which by default only ignores NotFound exceptions: https://github.com/openstack/heat/blob/master/heat/engine/clients/os/nova.py#L85 In this case, I think the problem is you're getting a Conflict exception when attempting to re-delete the NovaFloatingIpAssociation: https://github.com/openstack/heat/blob/master/heat/engine/resources/nova_floatingip.py#L148 So, I think this is probably a bug specific to NovaFloatingIpAssociation rather than a problem we need to fix accross all resources? I'd probably suggest we either add another except clause which catches (and ignores) this situation, or look at if novaclient is raising the wrong exception type, as NotFound would appear to be a saner error than Conflict when trying to delete a non-existent association? Steve __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [heat] Repeating stack-delete many times
On Tue, Feb 10, 2015 at 03:04:39PM +0400, Kairat Kushaev wrote: Hi all, During the analysis of the following bug: https://bugs.launchpad.net/heat/+bug/1418878 i figured out that orchestration engine doesn't work properly in some cases. The case is the following:A trying to delete the same stack with resources n times in series. It might happen if the stack deleting takes much time and a user is sending the second delete request again. Orchestration engine behavior is the following: 1) When first stack-delete command comes to heat service it acquires the stack lock and sends delete request for resources to other clients. Unfortunately, the command does not start to delete resources from heat db.A 2) At that time second stack-delete command for the same stack comes to heat engine. It steals the stack lock, waits 0.2 (hard-coded constant!)A sec to allow previous stack-delete command finish the operations (of course,A the first didn't manage to finish deleting on time). After that engine service startsA the deleting again: A A A - Request resources from heat DB (They exist!) A A A - Send requests for delete to other clients (They do not exist because ofA A A A A point 1). This is expected, and the reason for the following error path in most resource handle_delete paths is to ignore any do not exist errors: self.client_plugin().ignore_not_found(e) Finally, we have stack in DELETE_FAILED state because the clients raise exceptions during stack delete. This is the bug, the exception which is raised isn't getting ignored by the nova client plugin, which by default only ignores NotFound exceptions: https://github.com/openstack/heat/blob/master/heat/engine/clients/os/nova.py#L85 In this case, I think the problem is you're getting a Conflict exception when attempting to re-delete the NovaFloatingIpAssociation: https://github.com/openstack/heat/blob/master/heat/engine/resources/nova_floatingip.py#L148 So, I think this is probably a bug specific to NovaFloatingIpAssociation rather than a problem we need to fix accross all resources? I'd probably suggest we either add another except clause which catches (and ignores) this situation, or look at if novaclient is raising the wrong exception type, as NotFound would appear to be a saner error than Conflict when trying to delete a non-existent association? Steve __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev