On 09/10/2014 08:46 PM, Jamie Lennox wrote: > > ----- Original Message ----- >> From: "Steven Hardy" <sha...@redhat.com> >> To: "OpenStack Development Mailing List (not for usage questions)" >> <openstack-dev@lists.openstack.org> >> Sent: Thursday, September 11, 2014 1:55:49 AM >> Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying >> tokens leads to overall OpenStack fragility >> >> On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote: >>> Going through the untriaged Nova bugs, and there are a few on a similar >>> pattern: >>> >>> Nova operation in progress.... takes a while >>> Crosses keystone token expiration time >>> Timeout thrown >>> Operation fails >>> Terrible 500 error sent back to user >> >> We actually have this exact problem in Heat, which I'm currently trying to >> solve: >> >> https://bugs.launchpad.net/heat/+bug/1306294 >> >> Can you clarify, is the issue either: >> >> 1. Create novaclient object with username/password >> 2. Do series of operations via the client object which eventually fail >> after $n operations due to token expiry >> >> or: >> >> 1. Create novaclient object with username/password >> 2. Some really long operation which means token expires in the course of >> the service handling the request, blowing up and 500-ing >> >> If the former, then it does sound like a client, or usage-of-client bug, >> although note if you pass a *token* vs username/password (as is currently >> done for glance and heat in tempest, because we lack the code to get the >> token outside of the shell.py code..), there's nothing the client can do, >> because you can't request a new token with longer expiry with a token... >> >> However if the latter, then it seems like not really a client problem to >> solve, as it's hard to know what action to take if a request failed >> part-way through and thus things are in an unknown state. >> >> This issue is a hard problem, which can possibly be solved by >> switching to a trust scoped token (service impersonates the user), but then >> you're effectively bypassing token expiry via delegation which sits >> uncomfortably with me (despite the fact that we may have to do this in heat >> to solve the afforementioned bug) >> >>> It seems like we should have a standard pattern that on token expiration >>> the underlying code at least gives one retry to try to establish a new >>> token to complete the flow, however as far as I can tell *no* clients do >>> this. >> >> As has been mentioned, using sessions may be one solution to this, and >> AFAIK session support (where it doesn't already exist) is getting into >> various clients via the work being carried out to add support for v3 >> keystone by David Hu: >> >> https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z >> >> I see patches for Heat (currently gating), Nova and Ironic. >> >>> I know we had to add that into Tempest because tempest runs can exceed 1 >>> hr, and we want to avoid random fails just because we cross a token >>> expiration boundary. >> >> I can't claim great experience with sessions yet, but AIUI you could do >> something like: >> >> from keystoneclient.auth.identity import v3 >> from keystoneclient import session >> from keystoneclient.v3 import client >> >> auth = v3.Password(auth_url=OS_AUTH_URL, >> username=USERNAME, >> password=PASSWORD, >> project_id=PROJECT, >> user_domain_name='default') >> sess = session.Session(auth=auth) >> ks = client.Client(session=sess) >> >> And if you can pass the same session into the various clients tempest >> creates then the Password auth-plugin code takes care of reauthenticating >> if the token cached in the auth plugin object is expired, or nearly >> expired: >> >> https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120 >> >> So in the tempest case, it seems like it may be a case of migrating the >> code creating the clients to use sessions instead of passing a token or >> username/password into the client object? >> >> That's my understanding of it atm anyway, hopefully jamielennox will be along >> soon with more details :) >> >> Steve > > > By clients here are you referring to the CLIs or the python libraries? > Implementation is at different points with each. > > Sessions will handle automatically reauthenticating and retrying a request, > however it relies on the service throwing a 401 Unauthenticated error. If a > service is returning a 500 (or a timeout?) then there isn't much that a > client can/should do for that because we can't assume that trying again with > a new token will solve anything. > > At the moment we have keystoneclient, novaclient, cinderclient neutronclient > and then a number of the smaller projects with support for sessions. That > obviously doesn't mean that existing users of that code have transitioned to > the newer way though. David Hu has been working on using this code within the > existing CLIs. I have prototypes for at least nova to talk to neutron and > cinder which i'm waiting for Kilo to push. From there it should be easier to > do this for other services. > > For service to service communication there are two types. > 1) using the user's token like nova->cinder. If this token expires there is > really nothing that nova can do except raise 401 and make the client do it > again.
In this case it would be really good to do at least 1 retry, because it's completely silly for us to fail an action based on a token timeout. The solution ops are doing is changing their token expiration back to some really large number. > 2) using a service user like nova->neutron. This should allow automatic > reauthentication and will be fixed/standardied by sessions. Ok, glanceclient should be a high target here, because that's often involved in long running things (snapshot manip is slow). -Sean -- Sean Dague http://dague.net _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev