Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

Sean Dague Thu, 11 Sep 2014 04:47:53 -0700

On 09/10/2014 08:46 PM, Jamie Lennox wrote:
> 
> ----- Original Message -----
>> From: "Steven Hardy" <sha...@redhat.com>
>> To: "OpenStack Development Mailing List (not for usage questions)" 
>> <openstack-dev@lists.openstack.org>
>> Sent: Thursday, September 11, 2014 1:55:49 AM
>> Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
>> tokens leads to overall OpenStack fragility
>>
>> On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
>>> Going through the untriaged Nova bugs, and there are a few on a similar
>>> pattern:
>>>
>>> Nova operation in progress.... takes a while
>>> Crosses keystone token expiration time
>>> Timeout thrown
>>> Operation fails
>>> Terrible 500 error sent back to user
>>
>> We actually have this exact problem in Heat, which I'm currently trying to
>> solve:
>>
>> https://bugs.launchpad.net/heat/+bug/1306294
>>
>> Can you clarify, is the issue either:
>>
>> 1. Create novaclient object with username/password
>> 2. Do series of operations via the client object which eventually fail
>> after $n operations due to token expiry
>>
>> or:
>>
>> 1. Create novaclient object with username/password
>> 2. Some really long operation which means token expires in the course of
>> the service handling the request, blowing up and 500-ing
>>
>> If the former, then it does sound like a client, or usage-of-client bug,
>> although note if you pass a *token* vs username/password (as is currently
>> done for glance and heat in tempest, because we lack the code to get the
>> token outside of the shell.py code..), there's nothing the client can do,
>> because you can't request a new token with longer expiry with a token...
>>
>> However if the latter, then it seems like not really a client problem to
>> solve, as it's hard to know what action to take if a request failed
>> part-way through and thus things are in an unknown state.
>>
>> This issue is a hard problem, which can possibly be solved by
>> switching to a trust scoped token (service impersonates the user), but then
>> you're effectively bypassing token expiry via delegation which sits
>> uncomfortably with me (despite the fact that we may have to do this in heat
>> to solve the afforementioned bug)
>>
>>> It seems like we should have a standard pattern that on token expiration
>>> the underlying code at least gives one retry to try to establish a new
>>> token to complete the flow, however as far as I can tell *no* clients do
>>> this.
>>
>> As has been mentioned, using sessions may be one solution to this, and
>> AFAIK session support (where it doesn't already exist) is getting into
>> various clients via the work being carried out to add support for v3
>> keystone by David Hu:
>>
>> https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
>>
>> I see patches for Heat (currently gating), Nova and Ironic.
>>
>>> I know we had to add that into Tempest because tempest runs can exceed 1
>>> hr, and we want to avoid random fails just because we cross a token
>>> expiration boundary.
>>
>> I can't claim great experience with sessions yet, but AIUI you could do
>> something like:
>>
>> from keystoneclient.auth.identity import v3
>> from keystoneclient import session
>> from keystoneclient.v3 import client
>>
>> auth = v3.Password(auth_url=OS_AUTH_URL,
>>                    username=USERNAME,
>>                    password=PASSWORD,
>>                    project_id=PROJECT,
>>                    user_domain_name='default')
>> sess = session.Session(auth=auth)
>> ks = client.Client(session=sess)
>>
>> And if you can pass the same session into the various clients tempest
>> creates then the Password auth-plugin code takes care of reauthenticating
>> if the token cached in the auth plugin object is expired, or nearly
>> expired:
>>
>> https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120
>>
>> So in the tempest case, it seems like it may be a case of migrating the
>> code creating the clients to use sessions instead of passing a token or
>> username/password into the client object?
>>
>> That's my understanding of it atm anyway, hopefully jamielennox will be along
>> soon with more details :)
>>
>> Steve
> 
> 
> By clients here are you referring to the CLIs or the python libraries? 
> Implementation is at different points with each. 
> 
> Sessions will handle automatically reauthenticating and retrying a request, 
> however it relies on the service throwing a 401 Unauthenticated error. If a 
> service is returning a 500 (or a timeout?) then there isn't much that a 
> client can/should do for that because we can't assume that trying again with 
> a new token will solve anything. 
> 
> At the moment we have keystoneclient, novaclient, cinderclient neutronclient 
> and then a number of the smaller projects with support for sessions. That 
> obviously doesn't mean that existing users of that code have transitioned to 
> the newer way though. David Hu has been working on using this code within the 
> existing CLIs. I have prototypes for at least nova to talk to neutron and 
> cinder which i'm waiting for Kilo to push. From there it should be easier to 
> do this for other services. 
> 
> For service to service communication there are two types.
> 1) using the user's token like nova->cinder. If this token expires there is 
> really nothing that nova can do except raise 401 and make the client do it 
> again.


In this case it would be really good to do at least 1 retry, because
it's completely silly for us to fail an action based on a token timeout.
The solution ops are doing is changing their token expiration back to
some really large number.

> 2) using a service user like nova->neutron. This should allow automatic 
> reauthentication and will be fixed/standardied by sessions. 

Ok, glanceclient should be a high target here, because that's often
involved in long running things (snapshot manip is slow).

        -Sean

-- 
Sean Dague
http://dague.net

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

Reply via email to