On 04/03/16 04:35, Johannes Grassler wrote:
Hello,

I am currently looking into https://bugs.launchpad.net/heat/+bug/1442121
and
not quite sure how to tackle it, since the problematic code is used for
lots of
things[0]: the root cause of the problem are API clients in resource
plugins
that do not anticipate a resource with an entry in Heat's database
having been
deleted in the implementing service's database[1]. Here's an example:

https://github.com/openstack/heat/blob/e4dc942ce1a8c79b450345c7afae326c80d8a5d6/heat/engine/resources/openstack/neutron/floatingip.py#L179


If that happens[1] an uncaught exception will be thrown that among other
things
breaks the very operations one would need for cleaning up the mess.

As far as I can see it, the cleanest way would be to go through all
resources
with a fine comb and add exception handling to the API calls in the
add_dependencies() method where it is missing (just return False for any
resource that no longer exists). Or is there a better way?

Yes, you're right and this sucks. That's not the only problem we've had in this area recently - for example there was also:

https://bugs.launchpad.net/heat/+bug/1536515.

The fact that we have to have these hacked in implicit dependencies at all is terrible, but we really need to make sure they can't break basic operations like loading a stack from the DB so we can show or delete it. The ideal would be:

* We can guarantee to catch all (non-exit) exceptions, no matter what kind of crazy stuff people write in add_dependencies() * The plugin developer doesn't have to do anything special to get this behaviour * The code remains backwards compatible with any third-party resource plugins circulating out there * We always add as many dependencies as possible (i.e. all non-exception-raising dependencies are added) * Genuine dependency problems (e.g. non-existent target of get_resource/get_attr) are still surfaced, preferably on CREATE only

I'm pretty sure getting all of those is impossible. I'd be very interested in evaluating different tradeoffs we could make among them though.

In the meantime, we need to find and squash every instance of this problem wherever we can like you said.

cheers,
Zane.


Cheers,

Johannes

Footnotes:

[0] Whenever a stack's resources are being listed using
     heat.engine.service.list_stack_resources(). resource-list and
stack-delete,
     all invoke list_stack_resources()). stack-abandon does so
indirectly (it
     appears to trigger stack-delete judging by the log, but it yields the
     desired output, at least in Liberty). These are just the ones I
tested,
     there are probably more.

[1] It can happen for a number of reasons, either due to resource
dependency
     problems upon stack-delete as it happened in the original bug
report or due
     to an operator accidently deleting resources that are managed by Heat.




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to