On 15/10/15 21:10, Vladimir Kuklin wrote: > Gilles, > > 5xx errors like 503 and 502/504 could always be intermittent operational > issues. E.g. when you access your keystone backends through some proxy > and there is a connectivity issue between the proxy and backends which > disappears in 10 seconds, you do not need to rerun the puppet completely > - just retry the request. >
Look, I don't have much experience with those errors in real case scenarios. And this is just a details for my understanding, those errors are coming from a running HTTP service, therefore this is not a connectivity issue to the service but something wrong beyond that. > Regarding "REST interfaces for all Openstack API" - this is very close > to another topic that I raised ([0]) - using native Ruby application and > handle the exceptions. Otherwise whenever we have an OpenStack client > (generic or neutron/glance/etc. one) sending us a message like '[111] > Connection refused' this message is very much determined by the > framework that OpenStack is using within this release for clients. It > could be `requests` or any other type of framework which sends different > text message depending on its version. So it is very bothersome to write > a bunch of 'if' clauses or gigantic regexps instead of handling simple > Ruby exception. So I agree with you here - we need to work with the API > directly. And, by the way, if you also support switching to native Ruby > OpenStack API client, please feel free to support movement towards it in > the thread [0] > Yes, I totally agree with you on that approach (native Ruby lib). This why I mentioned it here because for me the exception handling would be solved at once. > Matt and Gilles, > > Regarding puppet-healthcheck - I do not think that puppet-healtcheck > handles exactly what I am mentioning here - it is not running exactly at > the same time as we run the request. > > E.g. 10 seconds ago everything was OK, then we had a temporary > connectivity issue, then everything is ok again in 10 seconds. Could you > please describe how puppet-healthcheck can help us solve this problem? > > Or another example - there was an issue with keystone accessing token > database when you have several keystone instances running, or there was > some desync between these instances, e.g. you fetched the token at > keystone #1 and then you verify it again keystone #2. Keystone #2 had > some issues verifying it not due to the fact that token was bad, but due > to the fact that that keystone #2 had some issues. We would get 401 > error and instead of trying to rerun the puppet we would need just to > handle this issue locally by retrying the request. > > [0] http://permalink.gmane.org/gmane.comp.cloud.openstack.devel/66423 > > On Thu, Oct 15, 2015 at 12:23 PM, Gilles Dubreuil <[email protected] > <mailto:[email protected]>> wrote: > > > > On 15/10/15 12:42, Matt Fischer wrote: > > > > > > On Thu, Oct 8, 2015 at 5:38 AM, Vladimir Kuklin <[email protected] > <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > > Hi, folks > > > > * Intro > > > > Per our discussion at Meeting #54 [0] I would like to propose the > > uniform approach of exception handling for all puppet-openstack > > providers accessing any types of OpenStack APIs. > > > > * Problem Description > > > > While working on Fuel during deployment of multi-node HA-aware > > environments we faced many intermittent operational issues, e.g.: > > > > 401/403 authentication failures when we were doing scaling of > > OpenStack controllers due to difference in hashing view between > > keystone instances > > 503/502/504 errors due to temporary connectivity issues > > The 5xx errors are not connectivity issues: > > 500 Internal Server Error > 501 Not Implemented > 502 Bad Gateway > 503 Service Unavailable > 504 Gateway Timeout > 505 HTTP Version Not Supported > > I believe nothing should be done to trap them. > > The connectivity issues are different matter (to be addressed as > mentioned by Matt) > > > non-idempotent operations like deletion or creation - e.g. if you > > are deleting an endpoint and someone is deleting on the other node > > and you get 404 - you should continue with success instead of > > failing. 409 Conflict error should also signal us to re-fetch > > resource parameters and then decide what to do with them. > > > > Obviously, it is not optimal to rerun puppet to correct such errors > > when we can just handle an exception properly. > > > > * Current State of Art > > > > There is some exception handling, but it does not cover all the > > aforementioned use cases. > > > > * Proposed solution > > > > Introduce a library of exception handling methods which should be > > the same for all puppet openstack providers as these exceptions seem > > to be generic. Then, for each of the providers we can introduce > > provider-specific libraries that will inherit from this one. > > > > Our mos-puppet team could add this into their backlog and could work > > on that in upstream or downstream and propose it upstream. > > > > What do you think on that, puppet folks? > > > > The real issue is because we're dealing with openstackclient, a CLI tool > and not an API. Therefore no error propagation is expected. > > Using REST interfaces for all Openstack API would provide all HTTP > errors: > > Check for "HTTP Response Classes" in > http://ruby-doc.org/stdlib-2.2.3/libdoc/net/http/rdoc/Net/HTTP.html > > > > [0] > http://eavesdrop.openstack.org/meetings/puppet_openstack/2015/puppet_openstack.2015-10-06-15.00.html > > > > > > I think that we should look into some solutions here as I'm generally > > for something we can solve once and re-use. Currently we solve some of > > this at TWC by serializing our deploys and disabling puppet site wide > > while we do so. This avoids the issue of Keystone on one node removing > > and endpoint while the other nodes (who still have old code) keep trying > > to add it back. > > > > For connectivity issues especially after service restarts, we're using > > puppet-healthcheck [0] and I'd like to discuss that more in Tokyo as an > > alternative to explicit retries and delays. It's in the etherpad so > > hopefully you can attend. > > +1 > > > > > [0] - https://github.com/puppet-community/puppet-healthcheck > > > > > > > > > __________________________________________________________________________ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: > [email protected]?subject:unsubscribe > <http://[email protected]?subject:unsubscribe> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: > [email protected]?subject:unsubscribe > <http://[email protected]?subject:unsubscribe> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > -- > Yours Faithfully, > Vladimir Kuklin, > Fuel Library Tech Lead, > Mirantis, Inc. > +7 (495) 640-49-04 > +7 (926) 702-39-68 > Skype kuklinvv > 35bk3, Vorontsovskaya Str. > Moscow, Russia, > www.mirantis.com <http://www.mirantis.ru/> > www.mirantis.ru <http://www.mirantis.ru/> > [email protected] <mailto:[email protected]> > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
