On 15/10/15 12:42, Matt Fischer wrote: > > > On Thu, Oct 8, 2015 at 5:38 AM, Vladimir Kuklin <vkuk...@mirantis.com > <mailto:vkuk...@mirantis.com>> wrote: > > Hi, folks > > * Intro > > Per our discussion at Meeting #54 [0] I would like to propose the > uniform approach of exception handling for all puppet-openstack > providers accessing any types of OpenStack APIs. > > * Problem Description > > While working on Fuel during deployment of multi-node HA-aware > environments we faced many intermittent operational issues, e.g.: > > 401/403 authentication failures when we were doing scaling of > OpenStack controllers due to difference in hashing view between > keystone instances > 503/502/504 errors due to temporary connectivity issues
The 5xx errors are not connectivity issues: 500 Internal Server Error 501 Not Implemented 502 Bad Gateway 503 Service Unavailable 504 Gateway Timeout 505 HTTP Version Not Supported I believe nothing should be done to trap them. The connectivity issues are different matter (to be addressed as mentioned by Matt) > non-idempotent operations like deletion or creation - e.g. if you > are deleting an endpoint and someone is deleting on the other node > and you get 404 - you should continue with success instead of > failing. 409 Conflict error should also signal us to re-fetch > resource parameters and then decide what to do with them. > > Obviously, it is not optimal to rerun puppet to correct such errors > when we can just handle an exception properly. > > * Current State of Art > > There is some exception handling, but it does not cover all the > aforementioned use cases. > > * Proposed solution > > Introduce a library of exception handling methods which should be > the same for all puppet openstack providers as these exceptions seem > to be generic. Then, for each of the providers we can introduce > provider-specific libraries that will inherit from this one. > > Our mos-puppet team could add this into their backlog and could work > on that in upstream or downstream and propose it upstream. > > What do you think on that, puppet folks? > The real issue is because we're dealing with openstackclient, a CLI tool and not an API. Therefore no error propagation is expected. Using REST interfaces for all Openstack API would provide all HTTP errors: Check for "HTTP Response Classes" in http://ruby-doc.org/stdlib-2.2.3/libdoc/net/http/rdoc/Net/HTTP.html > [0] > http://eavesdrop.openstack.org/meetings/puppet_openstack/2015/puppet_openstack.2015-10-06-15.00.html > > > I think that we should look into some solutions here as I'm generally > for something we can solve once and re-use. Currently we solve some of > this at TWC by serializing our deploys and disabling puppet site wide > while we do so. This avoids the issue of Keystone on one node removing > and endpoint while the other nodes (who still have old code) keep trying > to add it back. > > For connectivity issues especially after service restarts, we're using > puppet-healthcheck [0] and I'd like to discuss that more in Tokyo as an > alternative to explicit retries and delays. It's in the etherpad so > hopefully you can attend. +1 > > [0] - https://github.com/puppet-community/puppet-healthcheck > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev