On 09/11/2014 01:44 PM, Sean Dague wrote: > On 09/10/2014 08:46 PM, Jamie Lennox wrote: >> >> ----- Original Message ----- >>> From: "Steven Hardy" <sha...@redhat.com> >>> To: "OpenStack Development Mailing List (not for usage questions)" >>> <openstack-dev@lists.openstack.org> >>> Sent: Thursday, September 11, 2014 1:55:49 AM >>> Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying >>> tokens leads to overall OpenStack fragility >>> >>> On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote: >>>> Going through the untriaged Nova bugs, and there are a few on a similar >>>> pattern: >>>> >>>> Nova operation in progress.... takes a while >>>> Crosses keystone token expiration time >>>> Timeout thrown >>>> Operation fails >>>> Terrible 500 error sent back to user >>> >>> We actually have this exact problem in Heat, which I'm currently trying to >>> solve: >>> >>> https://bugs.launchpad.net/heat/+bug/1306294 >>> >>> Can you clarify, is the issue either: >>> >>> 1. Create novaclient object with username/password >>> 2. Do series of operations via the client object which eventually fail >>> after $n operations due to token expiry >>> >>> or: >>> >>> 1. Create novaclient object with username/password >>> 2. Some really long operation which means token expires in the course of >>> the service handling the request, blowing up and 500-ing >>> >>> If the former, then it does sound like a client, or usage-of-client bug, >>> although note if you pass a *token* vs username/password (as is currently >>> done for glance and heat in tempest, because we lack the code to get the >>> token outside of the shell.py code..), there's nothing the client can do, >>> because you can't request a new token with longer expiry with a token... >>> >>> However if the latter, then it seems like not really a client problem to >>> solve, as it's hard to know what action to take if a request failed >>> part-way through and thus things are in an unknown state. >>> >>> This issue is a hard problem, which can possibly be solved by >>> switching to a trust scoped token (service impersonates the user), but then >>> you're effectively bypassing token expiry via delegation which sits >>> uncomfortably with me (despite the fact that we may have to do this in heat >>> to solve the afforementioned bug) >>> >>>> It seems like we should have a standard pattern that on token expiration >>>> the underlying code at least gives one retry to try to establish a new >>>> token to complete the flow, however as far as I can tell *no* clients do >>>> this. >>> >>> As has been mentioned, using sessions may be one solution to this, and >>> AFAIK session support (where it doesn't already exist) is getting into >>> various clients via the work being carried out to add support for v3 >>> keystone by David Hu: >>> >>> https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z >>> >>> I see patches for Heat (currently gating), Nova and Ironic. >>> >>>> I know we had to add that into Tempest because tempest runs can exceed 1 >>>> hr, and we want to avoid random fails just because we cross a token >>>> expiration boundary. >>> >>> I can't claim great experience with sessions yet, but AIUI you could do >>> something like: >>> >>> from keystoneclient.auth.identity import v3 >>> from keystoneclient import session >>> from keystoneclient.v3 import client >>> >>> auth = v3.Password(auth_url=OS_AUTH_URL, >>> username=USERNAME, >>> password=PASSWORD, >>> project_id=PROJECT, >>> user_domain_name='default') >>> sess = session.Session(auth=auth) >>> ks = client.Client(session=sess) >>> >>> And if you can pass the same session into the various clients tempest >>> creates then the Password auth-plugin code takes care of reauthenticating >>> if the token cached in the auth plugin object is expired, or nearly >>> expired: >>> >>> https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120 >>> >>> So in the tempest case, it seems like it may be a case of migrating the >>> code creating the clients to use sessions instead of passing a token or >>> username/password into the client object? >>> >>> That's my understanding of it atm anyway, hopefully jamielennox will be >>> along >>> soon with more details :) >>> >>> Steve >> >> >> By clients here are you referring to the CLIs or the python libraries? >> Implementation is at different points with each. >> >> Sessions will handle automatically reauthenticating and retrying a request, >> however it relies on the service throwing a 401 Unauthenticated error. If a >> service is returning a 500 (or a timeout?) then there isn't much that a >> client can/should do for that because we can't assume that trying again with >> a new token will solve anything. >> >> At the moment we have keystoneclient, novaclient, cinderclient neutronclient >> and then a number of the smaller projects with support for sessions. That >> obviously doesn't mean that existing users of that code have transitioned to >> the newer way though. David Hu has been working on using this code within >> the existing CLIs. I have prototypes for at least nova to talk to neutron >> and cinder which i'm waiting for Kilo to push. From there it should be >> easier to do this for other services. >> >> For service to service communication there are two types. >> 1) using the user's token like nova->cinder. If this token expires there is >> really nothing that nova can do except raise 401 and make the client do it >> again. > > In this case it would be really good to do at least 1 retry, because > it's completely silly for us to fail an action based on a token timeout. > The solution ops are doing is changing their token expiration back to > some really large number. > >> 2) using a service user like nova->neutron. This should allow automatic >> reauthentication and will be fixed/standardied by sessions. > > Ok, glanceclient should be a high target here, because that's often > involved in long running things (snapshot manip is slow).
Agreed. I started looking at this a couple of weeks ago but I'm still not sure what the best way to do this is. The failure is common when uploading huge images and I also agree that at least 1 retry should be attempted. Flavio -- @flaper87 Flavio Percoco _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev