Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-17 Thread Angus Lees
On Wed, 17 Sep 2014 04:53:28 PM Duncan Thomas wrote:
> On 16 September 2014 01:28, Nathan Kinder  wrote:
> > The idea would be to leave normal tokens with a smaller validity period
> > (like the current default of an hour), but also allow one-time use
> > tokens to be requested.
> 
> Cinder backup makes many requests to swift during a backup, one per
> chunk to be uploaded plus one or more for the metadata file.

Right, and what if the HTTP connection times out and needs to be retried. Can 
I reuse my "single use" token?

Also: single-use tokens scale badly since they need a strongly consistent 
validation point that in normal use requires frequent writes.

-- 
 - Gus

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-17 Thread Duncan Thomas
On 16 September 2014 01:28, Nathan Kinder  wrote:
> The idea would be to leave normal tokens with a smaller validity period
> (like the current default of an hour), but also allow one-time use
> tokens to be requested.

Cinder backup makes many requests to swift during a backup, one per
chunk to be uploaded plus one or more for the metadata file.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-16 Thread Adam Young

On 09/15/2014 08:28 PM, Nathan Kinder wrote:


On 09/12/2014 12:46 AM, Angus Lees wrote:

On Thu, 11 Sep 2014 03:21:52 PM Steven Hardy wrote:

On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:

For service to service communication there are two types.
1) using the user's token like nova->cinder. If this token expires there
is really nothing that nova can do except raise 401 and make the client
do it again. 2) using a service user like nova->neutron. This should
allow automatic reauthentication and will be fixed/standardied by
sessions.

(1) is the problem I'm trying to solve in bug #1306294, and (for Heat at
least) there seems to be two solutions, neither of which I particularly
like:

- Require username/password to be passed into the service (something we've
   been trying to banish via migrating to trusts for deferred
   authentication)
- Create a trust, and impersonate the user for the duration of the request,
   or after the token expires until it is completed, using the service user
   credentials and the trust_id.

It's the second one which I'm deliberating over - technically it will work,
and we create the trust anyway (e.g for later use to do autoscaling etc),
but can anyone from the keystone team comment on the legitimacy of the
approach?

Intuitively it seems wrong, but I can't see any other way if we want to
support token-only auth and cope with folks doing stuff which takes 2 hours
with a 1 hour token expiry?

A possible 3rd option is some sort of longer lived, but limited scope
"capability token".

The user would create a capability token that represents "anyone possessing
this token is (eg) allowed to write to swift as $user".  The token could be
created by keystone as a trusted 3rd party or by swift (doesn't matter which),
in response to a request authenticated as $user.  The client then includes
that token in the request *to cinder*, so cinder can pass it back to swift
when doing the writes.
This capability token would be of much longer duration (long enough to
complete the cinder->swift task), which is ok because it is of a much more
limited scope (ideally as fine grained as we can bother implementing).

With UUID tokens, it would even be possible to implement a "one-time
use" sort of token.  Since Keystone needs to be asked to validate a UUID
token, the token could be invalidated by Keystone after the first
verification.  Since the token is limited based off of number of times
of usage, there should be less concerns about a long validity period
(though it would make sense to use something sane still).  This approach
wouldn't be possible with PKI tokens since Keystone is not in the
validation path.

Your idea of passing the "capability token" in the request would work
well with this, as the token only needs to be extracted and used once
instead of being passed from service to service and validated at each
hop (user>cinder->swift in your example).

The idea would be to leave normal tokens with a smaller validity period
(like the current default of an hour), but also allow one-time use
tokens to be requested.


It is dumb to make service get a token just to hand the token back to 
Keystone.


Guang Yee has pushed for years to get a capability into Keystone where 
certain API calls did not require a token, but would instead the 
permission would be based on whatever the users capabilites were at the 
time.


The problem is that "Admin" in the default policy (and hardcoded in V2) 
is definded to mean "User has the role admin on anything"  which is, of 
course, suboptimal (to say the least).


So validating a Token should not require a token.  We could add to the 
request some standard Stanza for saying "Here is the project/domain that 
I want to do this with"  so that we can atleast Keep Keystone's current 
behavior somewhat sane








(I like this option)


A 4th option is to have much longer lived tokens everywhere (long enough for
this backup), but the user is able to expire it early via keystone whenever
they feel it might be compromised (aiui this is exactly how things work now -
we just need to increase the timeout).  Greater exposure to replay attacks,
but if detected they can still be invalidated quickly.

(This is the easiest option, it's basically just formalising what the
operators are already doing)


A 5th option (wow) is to have the end user/client repeatedly push in fresh
tokens during long-running operations (and heat is the uber-example since it
basically wants to impersonate the user forever).  Those tokens would then
need to be refreshed all the way down the stack for any outstanding operations
that might need the new token.

(This or the 4th option seems ugly but unavoidable for "forever" services like
heat.  There has to be some way to invalidate their access if they go rogue,
either by time (and thus needs a refresh mechanism) or by invalidation-via-
keystone (which implies the token lasts forever unless invalidated))

I think Keystone trusts are better

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-15 Thread Nathan Kinder


On 09/12/2014 12:46 AM, Angus Lees wrote:
> On Thu, 11 Sep 2014 03:21:52 PM Steven Hardy wrote:
>> On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:
>>> For service to service communication there are two types.
>>> 1) using the user's token like nova->cinder. If this token expires there
>>> is really nothing that nova can do except raise 401 and make the client
>>> do it again. 2) using a service user like nova->neutron. This should
>>> allow automatic reauthentication and will be fixed/standardied by
>>> sessions.
>> (1) is the problem I'm trying to solve in bug #1306294, and (for Heat at
>> least) there seems to be two solutions, neither of which I particularly
>> like:
>>
>> - Require username/password to be passed into the service (something we've
>>   been trying to banish via migrating to trusts for deferred
>>   authentication)
>> - Create a trust, and impersonate the user for the duration of the request,
>>   or after the token expires until it is completed, using the service user
>>   credentials and the trust_id.
>>
>> It's the second one which I'm deliberating over - technically it will work,
>> and we create the trust anyway (e.g for later use to do autoscaling etc),
>> but can anyone from the keystone team comment on the legitimacy of the
>> approach?
>>
>> Intuitively it seems wrong, but I can't see any other way if we want to
>> support token-only auth and cope with folks doing stuff which takes 2 hours
>> with a 1 hour token expiry?
> 
> A possible 3rd option is some sort of longer lived, but limited scope 
> "capability token".
> 
> The user would create a capability token that represents "anyone possessing 
> this token is (eg) allowed to write to swift as $user".  The token could be 
> created by keystone as a trusted 3rd party or by swift (doesn't matter 
> which), 
> in response to a request authenticated as $user.  The client then includes 
> that token in the request *to cinder*, so cinder can pass it back to swift 
> when doing the writes.
> This capability token would be of much longer duration (long enough to 
> complete the cinder->swift task), which is ok because it is of a much more 
> limited scope (ideally as fine grained as we can bother implementing).

With UUID tokens, it would even be possible to implement a "one-time
use" sort of token.  Since Keystone needs to be asked to validate a UUID
token, the token could be invalidated by Keystone after the first
verification.  Since the token is limited based off of number of times
of usage, there should be less concerns about a long validity period
(though it would make sense to use something sane still).  This approach
wouldn't be possible with PKI tokens since Keystone is not in the
validation path.

Your idea of passing the "capability token" in the request would work
well with this, as the token only needs to be extracted and used once
instead of being passed from service to service and validated at each
hop (user>cinder->swift in your example).

The idea would be to leave normal tokens with a smaller validity period
(like the current default of an hour), but also allow one-time use
tokens to be requested.

> 
> (I like this option)
> 
> 
> A 4th option is to have much longer lived tokens everywhere (long enough for 
> this backup), but the user is able to expire it early via keystone whenever 
> they feel it might be compromised (aiui this is exactly how things work now - 
> we just need to increase the timeout).  Greater exposure to replay attacks, 
> but if detected they can still be invalidated quickly.
> 
> (This is the easiest option, it's basically just formalising what the 
> operators are already doing)
> 
> 
> A 5th option (wow) is to have the end user/client repeatedly push in fresh 
> tokens during long-running operations (and heat is the uber-example since it 
> basically wants to impersonate the user forever).  Those tokens would then 
> need to be refreshed all the way down the stack for any outstanding 
> operations 
> that might need the new token.
> 
> (This or the 4th option seems ugly but unavoidable for "forever" services 
> like 
> heat.  There has to be some way to invalidate their access if they go rogue, 
> either by time (and thus needs a refresh mechanism) or by invalidation-via-
> keystone (which implies the token lasts forever unless invalidated))

I think Keystone trusts are better for "forever" services, though I see
no reason why a trust token also couldn't have a limited number of uses
with a longer validity period.  The trust itself doesn't need an
expiration, so the trust can be executed at some future point in time to
get a limited use trust token.

> 
> 
> However we do it:  the "permission" to do the action should come from the 
> original user - and this is expressed as tokens coming from the original 
> client/user in some form.   By allowing services to create something without 
> the original client/user being involved, we're really just bypassing the 
> token 
> authen

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-15 Thread Brant Knudson
On Wed, Sep 10, 2014 at 9:14 AM, Sean Dague  wrote:

> Going through the untriaged Nova bugs, and there are a few on a similar
> pattern:
>
> Nova operation in progress takes a while
> Crosses keystone token expiration time
> Timeout thrown
> Operation fails
> Terrible 500 error sent back to user
>
> It seems like we should have a standard pattern that on token expiration
> the underlying code at least gives one retry to try to establish a new
> token to complete the flow, however as far as I can tell *no* clients do
> this.
>
> I know we had to add that into Tempest because tempest runs can exceed 1
> hr, and we want to avoid random fails just because we cross a token
> expiration boundary.
>
> Anyone closer to the clients that can comment here?
>
> -Sean
>
>
Currently, a service with a token can't always refresh a new token, because
the service doesn't always have the user's credentials (which is good...
the service shouldn't have the user's credentials), and even if the
credentials were available the service might not be able to use them to
authenticate (not all authentication is done using username and password).

The most obvious solution to me is to have the identity server provides an
api where, given a token, you can get a new token with an expiration time
of your choice. Use of the API would be limited to service users. When a
service gets a token that it wants to send on to another service it first
uses the existing token to get a new token with whatever expiration time it
thinks would be adequate. If the service knows that it's done with the
token it will hopefully revoke the new token to keep the token database
clean.

The only thing missing from the existing auth API for getting a token from
a token is being able to set the expiration time --
https://github.com/openstack/identity-api/blob/master/v3/src/markdown/identity-api-v3.md#authentication-authentication
. Keystone will also have to be enhanced to validate that if the
token-from-token request has a new expiration time the requestor has the
required role.

- Brant
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-12 Thread Steven Hardy
On Thu, Sep 11, 2014 at 08:43:22PM -0400, Jamie Lennox wrote:
> 
> 
> - Original Message -
> > From: "Steven Hardy" 
> > To: "OpenStack Development Mailing List (not for usage questions)" 
> > 
> > Sent: Friday, 12 September, 2014 12:21:52 AM
> > Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
> > tokens leads to overall OpenStack fragility
> > 
> > On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:
> > > 
> > > - Original Message -
> > > > From: "Steven Hardy" 
> > > > To: "OpenStack Development Mailing List (not for usage questions)"
> > > > 
> > > > Sent: Thursday, September 11, 2014 1:55:49 AM
> > > > Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying
> > > > tokens leads to overall OpenStack fragility
> > > > 
> > > > On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
> > > > > Going through the untriaged Nova bugs, and there are a few on a 
> > > > > similar
> > > > > pattern:
> > > > > 
> > > > > Nova operation in progress takes a while
> > > > > Crosses keystone token expiration time
> > > > > Timeout thrown
> > > > > Operation fails
> > > > > Terrible 500 error sent back to user
> > > > 
> > > > We actually have this exact problem in Heat, which I'm currently trying
> > > > to
> > > > solve:
> > > > 
> > > > https://bugs.launchpad.net/heat/+bug/1306294
> > > > 
> > > > Can you clarify, is the issue either:
> > > > 
> > > > 1. Create novaclient object with username/password
> > > > 2. Do series of operations via the client object which eventually fail
> > > > after $n operations due to token expiry
> > > > 
> > > > or:
> > > > 
> > > > 1. Create novaclient object with username/password
> > > > 2. Some really long operation which means token expires in the course of
> > > > the service handling the request, blowing up and 500-ing
> > > > 
> > > > If the former, then it does sound like a client, or usage-of-client bug,
> > > > although note if you pass a *token* vs username/password (as is 
> > > > currently
> > > > done for glance and heat in tempest, because we lack the code to get the
> > > > token outside of the shell.py code..), there's nothing the client can 
> > > > do,
> > > > because you can't request a new token with longer expiry with a token...
> > > > 
> > > > However if the latter, then it seems like not really a client problem to
> > > > solve, as it's hard to know what action to take if a request failed
> > > > part-way through and thus things are in an unknown state.
> > > > 
> > > > This issue is a hard problem, which can possibly be solved by
> > > > switching to a trust scoped token (service impersonates the user), but
> > > > then
> > > > you're effectively bypassing token expiry via delegation which sits
> > > > uncomfortably with me (despite the fact that we may have to do this in
> > > > heat
> > > > to solve the afforementioned bug)
> > > > 
> > > > > It seems like we should have a standard pattern that on token
> > > > > expiration
> > > > > the underlying code at least gives one retry to try to establish a new
> > > > > token to complete the flow, however as far as I can tell *no* clients
> > > > > do
> > > > > this.
> > > > 
> > > > As has been mentioned, using sessions may be one solution to this, and
> > > > AFAIK session support (where it doesn't already exist) is getting into
> > > > various clients via the work being carried out to add support for v3
> > > > keystone by David Hu:
> > > > 
> > > > https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
> > > > 
> > > > I see patches for Heat (currently gating), Nova and Ironic.
> > > > 
> > > > > I know we had to add that into Tempest because tempest runs can exceed
> > > > > 1
> > > > > hr, and we want to avoid random fails just because we cross a token
> > > > > expiration boundary.
> > > > 
> > > > I c

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-12 Thread Flavio Percoco
On 09/11/2014 01:44 PM, Sean Dague wrote:
> On 09/10/2014 08:46 PM, Jamie Lennox wrote:
>>
>> - Original Message -
>>> From: "Steven Hardy" 
>>> To: "OpenStack Development Mailing List (not for usage questions)" 
>>> 
>>> Sent: Thursday, September 11, 2014 1:55:49 AM
>>> Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
>>> tokens leads to overall OpenStack fragility
>>>
>>> On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
>>>> Going through the untriaged Nova bugs, and there are a few on a similar
>>>> pattern:
>>>>
>>>> Nova operation in progress takes a while
>>>> Crosses keystone token expiration time
>>>> Timeout thrown
>>>> Operation fails
>>>> Terrible 500 error sent back to user
>>>
>>> We actually have this exact problem in Heat, which I'm currently trying to
>>> solve:
>>>
>>> https://bugs.launchpad.net/heat/+bug/1306294
>>>
>>> Can you clarify, is the issue either:
>>>
>>> 1. Create novaclient object with username/password
>>> 2. Do series of operations via the client object which eventually fail
>>> after $n operations due to token expiry
>>>
>>> or:
>>>
>>> 1. Create novaclient object with username/password
>>> 2. Some really long operation which means token expires in the course of
>>> the service handling the request, blowing up and 500-ing
>>>
>>> If the former, then it does sound like a client, or usage-of-client bug,
>>> although note if you pass a *token* vs username/password (as is currently
>>> done for glance and heat in tempest, because we lack the code to get the
>>> token outside of the shell.py code..), there's nothing the client can do,
>>> because you can't request a new token with longer expiry with a token...
>>>
>>> However if the latter, then it seems like not really a client problem to
>>> solve, as it's hard to know what action to take if a request failed
>>> part-way through and thus things are in an unknown state.
>>>
>>> This issue is a hard problem, which can possibly be solved by
>>> switching to a trust scoped token (service impersonates the user), but then
>>> you're effectively bypassing token expiry via delegation which sits
>>> uncomfortably with me (despite the fact that we may have to do this in heat
>>> to solve the afforementioned bug)
>>>
>>>> It seems like we should have a standard pattern that on token expiration
>>>> the underlying code at least gives one retry to try to establish a new
>>>> token to complete the flow, however as far as I can tell *no* clients do
>>>> this.
>>>
>>> As has been mentioned, using sessions may be one solution to this, and
>>> AFAIK session support (where it doesn't already exist) is getting into
>>> various clients via the work being carried out to add support for v3
>>> keystone by David Hu:
>>>
>>> https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
>>>
>>> I see patches for Heat (currently gating), Nova and Ironic.
>>>
>>>> I know we had to add that into Tempest because tempest runs can exceed 1
>>>> hr, and we want to avoid random fails just because we cross a token
>>>> expiration boundary.
>>>
>>> I can't claim great experience with sessions yet, but AIUI you could do
>>> something like:
>>>
>>> from keystoneclient.auth.identity import v3
>>> from keystoneclient import session
>>> from keystoneclient.v3 import client
>>>
>>> auth = v3.Password(auth_url=OS_AUTH_URL,
>>>username=USERNAME,
>>>password=PASSWORD,
>>>project_id=PROJECT,
>>>user_domain_name='default')
>>> sess = session.Session(auth=auth)
>>> ks = client.Client(session=sess)
>>>
>>> And if you can pass the same session into the various clients tempest
>>> creates then the Password auth-plugin code takes care of reauthenticating
>>> if the token cached in the auth plugin object is expired, or nearly
>>> expired:
>>>
>>> https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120
>>>
>>> So in the tempest case, it seems like it may be a case of migrating 

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-12 Thread Angus Lees
On Thu, 11 Sep 2014 03:21:52 PM Steven Hardy wrote:
> On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:
> > For service to service communication there are two types.
> > 1) using the user's token like nova->cinder. If this token expires there
> > is really nothing that nova can do except raise 401 and make the client
> > do it again. 2) using a service user like nova->neutron. This should
> > allow automatic reauthentication and will be fixed/standardied by
> > sessions.
> (1) is the problem I'm trying to solve in bug #1306294, and (for Heat at
> least) there seems to be two solutions, neither of which I particularly
> like:
> 
> - Require username/password to be passed into the service (something we've
>   been trying to banish via migrating to trusts for deferred
>   authentication)
> - Create a trust, and impersonate the user for the duration of the request,
>   or after the token expires until it is completed, using the service user
>   credentials and the trust_id.
> 
> It's the second one which I'm deliberating over - technically it will work,
> and we create the trust anyway (e.g for later use to do autoscaling etc),
> but can anyone from the keystone team comment on the legitimacy of the
> approach?
> 
> Intuitively it seems wrong, but I can't see any other way if we want to
> support token-only auth and cope with folks doing stuff which takes 2 hours
> with a 1 hour token expiry?

A possible 3rd option is some sort of longer lived, but limited scope 
"capability token".

The user would create a capability token that represents "anyone possessing 
this token is (eg) allowed to write to swift as $user".  The token could be 
created by keystone as a trusted 3rd party or by swift (doesn't matter which), 
in response to a request authenticated as $user.  The client then includes 
that token in the request *to cinder*, so cinder can pass it back to swift 
when doing the writes.
This capability token would be of much longer duration (long enough to 
complete the cinder->swift task), which is ok because it is of a much more 
limited scope (ideally as fine grained as we can bother implementing).

(I like this option)


A 4th option is to have much longer lived tokens everywhere (long enough for 
this backup), but the user is able to expire it early via keystone whenever 
they feel it might be compromised (aiui this is exactly how things work now - 
we just need to increase the timeout).  Greater exposure to replay attacks, 
but if detected they can still be invalidated quickly.

(This is the easiest option, it's basically just formalising what the 
operators are already doing)


A 5th option (wow) is to have the end user/client repeatedly push in fresh 
tokens during long-running operations (and heat is the uber-example since it 
basically wants to impersonate the user forever).  Those tokens would then 
need to be refreshed all the way down the stack for any outstanding operations 
that might need the new token.

(This or the 4th option seems ugly but unavoidable for "forever" services like 
heat.  There has to be some way to invalidate their access if they go rogue, 
either by time (and thus needs a refresh mechanism) or by invalidation-via-
keystone (which implies the token lasts forever unless invalidated))


However we do it:  the "permission" to do the action should come from the 
original user - and this is expressed as tokens coming from the original 
client/user in some form.   By allowing services to create something without 
the original client/user being involved, we're really just bypassing the token 
authentication mechanism (and there are easier ways to ignore the token ;)

-- 
 - Gus

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-12 Thread Angus Lees
On Thu, 11 Sep 2014 03:00:02 PM Duncan Thomas wrote:
> On 11 September 2014 03:17, Angus Lees  wrote:
> > (As inspired by eg kerberos)
> > 2. Ensure at some environmental/top layer that the advertised token
> > lifetime exceeds the timeout set on the request, before making the
> > request.  This implies (since there's no special handling in place)
> > failing if the token was expired earlier than expected.
> 
> We've a related problem in cinder (cinder-backup uses the user's token
> to talk to swift, and the backup can easily take longer than the token
> expiry time) which could not be solved by this, since the time the
> backup takes is unknown (compression, service and resource contention,
> etc alter the time by multiple orders of magnitude)

Yes, this sounds like another example of the cross-service problem I was 
describing with refreshing the token at the bottom layer - but I disagree that 
this is handled any better by refreshing tokens on-demand at the bottom layer.

In order to have cinder refresh the token while talking to swift, it needs to 
know the user's password (ouch - why even have the token) or have magic token 
creating powers (in which case none of this matters, because cinder can just 
create tokens any time it wants).

As far as I can see, we either need to be able to 1) generate tokens that _do_ 
last "long enough", 2) pass user+password to cinder so it is capable of 
creating new tokens as necessary, or 3) only perform token-based auth once at 
the start of a long cinder<->glance workflow like this, and then use some sort 
of limited-scope-but-unlimited-time "session token" for follow-on requests.

I think I'm advocating for (1) or (3), and (2) as a distant third.


... Unless there's some other option here?  Your dismissal above sounded like 
there was already a solution for this - what's the current solution?

-- 
 - Gus

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-11 Thread Jamie Lennox


- Original Message -
> From: "Steven Hardy" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Friday, 12 September, 2014 12:21:52 AM
> Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
> tokens leads to overall OpenStack fragility
> 
> On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:
> > 
> > - Original Message -
> > > From: "Steven Hardy" 
> > > To: "OpenStack Development Mailing List (not for usage questions)"
> > > 
> > > Sent: Thursday, September 11, 2014 1:55:49 AM
> > > Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying
> > > tokens leads to overall OpenStack fragility
> > > 
> > > On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
> > > > Going through the untriaged Nova bugs, and there are a few on a similar
> > > > pattern:
> > > > 
> > > > Nova operation in progress takes a while
> > > > Crosses keystone token expiration time
> > > > Timeout thrown
> > > > Operation fails
> > > > Terrible 500 error sent back to user
> > > 
> > > We actually have this exact problem in Heat, which I'm currently trying
> > > to
> > > solve:
> > > 
> > > https://bugs.launchpad.net/heat/+bug/1306294
> > > 
> > > Can you clarify, is the issue either:
> > > 
> > > 1. Create novaclient object with username/password
> > > 2. Do series of operations via the client object which eventually fail
> > > after $n operations due to token expiry
> > > 
> > > or:
> > > 
> > > 1. Create novaclient object with username/password
> > > 2. Some really long operation which means token expires in the course of
> > > the service handling the request, blowing up and 500-ing
> > > 
> > > If the former, then it does sound like a client, or usage-of-client bug,
> > > although note if you pass a *token* vs username/password (as is currently
> > > done for glance and heat in tempest, because we lack the code to get the
> > > token outside of the shell.py code..), there's nothing the client can do,
> > > because you can't request a new token with longer expiry with a token...
> > > 
> > > However if the latter, then it seems like not really a client problem to
> > > solve, as it's hard to know what action to take if a request failed
> > > part-way through and thus things are in an unknown state.
> > > 
> > > This issue is a hard problem, which can possibly be solved by
> > > switching to a trust scoped token (service impersonates the user), but
> > > then
> > > you're effectively bypassing token expiry via delegation which sits
> > > uncomfortably with me (despite the fact that we may have to do this in
> > > heat
> > > to solve the afforementioned bug)
> > > 
> > > > It seems like we should have a standard pattern that on token
> > > > expiration
> > > > the underlying code at least gives one retry to try to establish a new
> > > > token to complete the flow, however as far as I can tell *no* clients
> > > > do
> > > > this.
> > > 
> > > As has been mentioned, using sessions may be one solution to this, and
> > > AFAIK session support (where it doesn't already exist) is getting into
> > > various clients via the work being carried out to add support for v3
> > > keystone by David Hu:
> > > 
> > > https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
> > > 
> > > I see patches for Heat (currently gating), Nova and Ironic.
> > > 
> > > > I know we had to add that into Tempest because tempest runs can exceed
> > > > 1
> > > > hr, and we want to avoid random fails just because we cross a token
> > > > expiration boundary.
> > > 
> > > I can't claim great experience with sessions yet, but AIUI you could do
> > > something like:
> > > 
> > > from keystoneclient.auth.identity import v3
> > > from keystoneclient import session
> > > from keystoneclient.v3 import client
> > > 
> > > auth = v3.Password(auth_url=OS_AUTH_URL,
> > >username=USERNAME,
> > >password=PASSWORD,
> > >project_id=PROJECT,
> > >user_domain_name='

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-11 Thread Jamie Lennox


- Original Message -
> From: "Sean Dague" 
> To: openstack-dev@lists.openstack.org
> Sent: Thursday, 11 September, 2014 9:44:43 PM
> Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
> tokens leads to overall OpenStack fragility
> 
> On 09/10/2014 08:46 PM, Jamie Lennox wrote:
> > 
> > - Original Message -
> >> From: "Steven Hardy" 
> >> To: "OpenStack Development Mailing List (not for usage questions)"
> >> 
> >> Sent: Thursday, September 11, 2014 1:55:49 AM
> >> Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying
> >> tokens leads to overall OpenStack fragility
> >>
> >> On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
> >>> Going through the untriaged Nova bugs, and there are a few on a similar
> >>> pattern:
> >>>
> >>> Nova operation in progress takes a while
> >>> Crosses keystone token expiration time
> >>> Timeout thrown
> >>> Operation fails
> >>> Terrible 500 error sent back to user
> >>
> >> We actually have this exact problem in Heat, which I'm currently trying to
> >> solve:
> >>
> >> https://bugs.launchpad.net/heat/+bug/1306294
> >>
> >> Can you clarify, is the issue either:
> >>
> >> 1. Create novaclient object with username/password
> >> 2. Do series of operations via the client object which eventually fail
> >> after $n operations due to token expiry
> >>
> >> or:
> >>
> >> 1. Create novaclient object with username/password
> >> 2. Some really long operation which means token expires in the course of
> >> the service handling the request, blowing up and 500-ing
> >>
> >> If the former, then it does sound like a client, or usage-of-client bug,
> >> although note if you pass a *token* vs username/password (as is currently
> >> done for glance and heat in tempest, because we lack the code to get the
> >> token outside of the shell.py code..), there's nothing the client can do,
> >> because you can't request a new token with longer expiry with a token...
> >>
> >> However if the latter, then it seems like not really a client problem to
> >> solve, as it's hard to know what action to take if a request failed
> >> part-way through and thus things are in an unknown state.
> >>
> >> This issue is a hard problem, which can possibly be solved by
> >> switching to a trust scoped token (service impersonates the user), but
> >> then
> >> you're effectively bypassing token expiry via delegation which sits
> >> uncomfortably with me (despite the fact that we may have to do this in
> >> heat
> >> to solve the afforementioned bug)
> >>
> >>> It seems like we should have a standard pattern that on token expiration
> >>> the underlying code at least gives one retry to try to establish a new
> >>> token to complete the flow, however as far as I can tell *no* clients do
> >>> this.
> >>
> >> As has been mentioned, using sessions may be one solution to this, and
> >> AFAIK session support (where it doesn't already exist) is getting into
> >> various clients via the work being carried out to add support for v3
> >> keystone by David Hu:
> >>
> >> https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
> >>
> >> I see patches for Heat (currently gating), Nova and Ironic.
> >>
> >>> I know we had to add that into Tempest because tempest runs can exceed 1
> >>> hr, and we want to avoid random fails just because we cross a token
> >>> expiration boundary.
> >>
> >> I can't claim great experience with sessions yet, but AIUI you could do
> >> something like:
> >>
> >> from keystoneclient.auth.identity import v3
> >> from keystoneclient import session
> >> from keystoneclient.v3 import client
> >>
> >> auth = v3.Password(auth_url=OS_AUTH_URL,
> >>username=USERNAME,
> >>password=PASSWORD,
> >>project_id=PROJECT,
> >>user_domain_name='default')
> >> sess = session.Session(auth=auth)
> >> ks = client.Client(session=sess)
> >>
> >> And if you can pass the same session into the various clients tempest
> >> creates then the Password auth-pl

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-11 Thread Steven Hardy
On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:
> 
> - Original Message -
> > From: "Steven Hardy" 
> > To: "OpenStack Development Mailing List (not for usage questions)" 
> > 
> > Sent: Thursday, September 11, 2014 1:55:49 AM
> > Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
> > tokens leads to overall OpenStack fragility
> > 
> > On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
> > > Going through the untriaged Nova bugs, and there are a few on a similar
> > > pattern:
> > > 
> > > Nova operation in progress takes a while
> > > Crosses keystone token expiration time
> > > Timeout thrown
> > > Operation fails
> > > Terrible 500 error sent back to user
> > 
> > We actually have this exact problem in Heat, which I'm currently trying to
> > solve:
> > 
> > https://bugs.launchpad.net/heat/+bug/1306294
> > 
> > Can you clarify, is the issue either:
> > 
> > 1. Create novaclient object with username/password
> > 2. Do series of operations via the client object which eventually fail
> > after $n operations due to token expiry
> > 
> > or:
> > 
> > 1. Create novaclient object with username/password
> > 2. Some really long operation which means token expires in the course of
> > the service handling the request, blowing up and 500-ing
> > 
> > If the former, then it does sound like a client, or usage-of-client bug,
> > although note if you pass a *token* vs username/password (as is currently
> > done for glance and heat in tempest, because we lack the code to get the
> > token outside of the shell.py code..), there's nothing the client can do,
> > because you can't request a new token with longer expiry with a token...
> > 
> > However if the latter, then it seems like not really a client problem to
> > solve, as it's hard to know what action to take if a request failed
> > part-way through and thus things are in an unknown state.
> > 
> > This issue is a hard problem, which can possibly be solved by
> > switching to a trust scoped token (service impersonates the user), but then
> > you're effectively bypassing token expiry via delegation which sits
> > uncomfortably with me (despite the fact that we may have to do this in heat
> > to solve the afforementioned bug)
> > 
> > > It seems like we should have a standard pattern that on token expiration
> > > the underlying code at least gives one retry to try to establish a new
> > > token to complete the flow, however as far as I can tell *no* clients do
> > > this.
> > 
> > As has been mentioned, using sessions may be one solution to this, and
> > AFAIK session support (where it doesn't already exist) is getting into
> > various clients via the work being carried out to add support for v3
> > keystone by David Hu:
> > 
> > https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
> > 
> > I see patches for Heat (currently gating), Nova and Ironic.
> > 
> > > I know we had to add that into Tempest because tempest runs can exceed 1
> > > hr, and we want to avoid random fails just because we cross a token
> > > expiration boundary.
> > 
> > I can't claim great experience with sessions yet, but AIUI you could do
> > something like:
> > 
> > from keystoneclient.auth.identity import v3
> > from keystoneclient import session
> > from keystoneclient.v3 import client
> > 
> > auth = v3.Password(auth_url=OS_AUTH_URL,
> >username=USERNAME,
> >password=PASSWORD,
> >project_id=PROJECT,
> >user_domain_name='default')
> > sess = session.Session(auth=auth)
> > ks = client.Client(session=sess)
> > 
> > And if you can pass the same session into the various clients tempest
> > creates then the Password auth-plugin code takes care of reauthenticating
> > if the token cached in the auth plugin object is expired, or nearly
> > expired:
> > 
> > https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120
> > 
> > So in the tempest case, it seems like it may be a case of migrating the
> > code creating the clients to use sessions instead of passing a token or
> > username/password into the client object?
> > 
> > That's my understanding of it atm anyway, hopefully jamielennox will be 
> > along
> > soon wit

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-11 Thread Duncan Thomas
On 11 September 2014 03:17, Angus Lees  wrote:

> (As inspired by eg kerberos)
> 2. Ensure at some environmental/top layer that the advertised token lifetime
> exceeds the timeout set on the request, before making the request.  This
> implies (since there's no special handling in place) failing if the token was
> expired earlier than expected.

We've a related problem in cinder (cinder-backup uses the user's token
to talk to swift, and the backup can easily take longer than the token
expiry time) which could not be solved by this, since the time the
backup takes is unknown (compression, service and resource contention,
etc alter the time by multiple orders of magnitude)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-11 Thread Sean Dague
On 09/10/2014 11:55 AM, Steven Hardy wrote:
> On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
>> Going through the untriaged Nova bugs, and there are a few on a similar
>> pattern:
>>
>> Nova operation in progress takes a while
>> Crosses keystone token expiration time
>> Timeout thrown
>> Operation fails
>> Terrible 500 error sent back to user
> 
> We actually have this exact problem in Heat, which I'm currently trying to
> solve:
> 
> https://bugs.launchpad.net/heat/+bug/1306294
> 
> Can you clarify, is the issue either:
> 
> 1. Create novaclient object with username/password
> 2. Do series of operations via the client object which eventually fail
> after $n operations due to token expiry
> 
> or:
> 
> 1. Create novaclient object with username/password
> 2. Some really long operation which means token expires in the course of
> the service handling the request, blowing up and 500-ing

>From what I can tell of the Nova bugs both are issues. Honestly, it
would probably be really telling to setup a test env with 10s token
timeouts and see how crazy it broke. I expect that our expiration logic,
and how our components react to it, is actually a lot less coherent than
we believe.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-11 Thread Sean Dague
On 09/10/2014 08:46 PM, Jamie Lennox wrote:
> 
> - Original Message -
>> From: "Steven Hardy" 
>> To: "OpenStack Development Mailing List (not for usage questions)" 
>> 
>> Sent: Thursday, September 11, 2014 1:55:49 AM
>> Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
>> tokens leads to overall OpenStack fragility
>>
>> On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
>>> Going through the untriaged Nova bugs, and there are a few on a similar
>>> pattern:
>>>
>>> Nova operation in progress takes a while
>>> Crosses keystone token expiration time
>>> Timeout thrown
>>> Operation fails
>>> Terrible 500 error sent back to user
>>
>> We actually have this exact problem in Heat, which I'm currently trying to
>> solve:
>>
>> https://bugs.launchpad.net/heat/+bug/1306294
>>
>> Can you clarify, is the issue either:
>>
>> 1. Create novaclient object with username/password
>> 2. Do series of operations via the client object which eventually fail
>> after $n operations due to token expiry
>>
>> or:
>>
>> 1. Create novaclient object with username/password
>> 2. Some really long operation which means token expires in the course of
>> the service handling the request, blowing up and 500-ing
>>
>> If the former, then it does sound like a client, or usage-of-client bug,
>> although note if you pass a *token* vs username/password (as is currently
>> done for glance and heat in tempest, because we lack the code to get the
>> token outside of the shell.py code..), there's nothing the client can do,
>> because you can't request a new token with longer expiry with a token...
>>
>> However if the latter, then it seems like not really a client problem to
>> solve, as it's hard to know what action to take if a request failed
>> part-way through and thus things are in an unknown state.
>>
>> This issue is a hard problem, which can possibly be solved by
>> switching to a trust scoped token (service impersonates the user), but then
>> you're effectively bypassing token expiry via delegation which sits
>> uncomfortably with me (despite the fact that we may have to do this in heat
>> to solve the afforementioned bug)
>>
>>> It seems like we should have a standard pattern that on token expiration
>>> the underlying code at least gives one retry to try to establish a new
>>> token to complete the flow, however as far as I can tell *no* clients do
>>> this.
>>
>> As has been mentioned, using sessions may be one solution to this, and
>> AFAIK session support (where it doesn't already exist) is getting into
>> various clients via the work being carried out to add support for v3
>> keystone by David Hu:
>>
>> https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
>>
>> I see patches for Heat (currently gating), Nova and Ironic.
>>
>>> I know we had to add that into Tempest because tempest runs can exceed 1
>>> hr, and we want to avoid random fails just because we cross a token
>>> expiration boundary.
>>
>> I can't claim great experience with sessions yet, but AIUI you could do
>> something like:
>>
>> from keystoneclient.auth.identity import v3
>> from keystoneclient import session
>> from keystoneclient.v3 import client
>>
>> auth = v3.Password(auth_url=OS_AUTH_URL,
>>username=USERNAME,
>>password=PASSWORD,
>>project_id=PROJECT,
>>user_domain_name='default')
>> sess = session.Session(auth=auth)
>> ks = client.Client(session=sess)
>>
>> And if you can pass the same session into the various clients tempest
>> creates then the Password auth-plugin code takes care of reauthenticating
>> if the token cached in the auth plugin object is expired, or nearly
>> expired:
>>
>> https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120
>>
>> So in the tempest case, it seems like it may be a case of migrating the
>> code creating the clients to use sessions instead of passing a token or
>> username/password into the client object?
>>
>> That's my understanding of it atm anyway, hopefully jamielennox will be along
>> soon with more details :)
>>
>> Steve
> 
> 
> By clients here are you referring to the CLIs or the python libraries? 
> Implementation is at diff

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-10 Thread Angus Lees
On Wed, 10 Sep 2014 10:14:32 AM Sean Dague wrote:
> Going through the untriaged Nova bugs, and there are a few on a similar
> pattern:
> 
> Nova operation in progress takes a while
> Crosses keystone token expiration time
> Timeout thrown
> Operation fails
> Terrible 500 error sent back to user
> 
> It seems like we should have a standard pattern that on token expiration
> the underlying code at least gives one retry to try to establish a new
> token to complete the flow, however as far as I can tell *no* clients do
> this.

Just because this came up in conversation a few weeks ago in the context of 
the ironic client.  I've read some docs and written a keystone client, but I'm 
not super-familiar with keystone internals - apologies if I miss something 
fundamental.


There are two broadly different approaches to dealing with this:

(As described by Sean, and implemented in a few clients)
1. At the bottom layer, try to refresh the token and immediately retry 
whenever a server response indicates the token has expired.

(As inspired by eg kerberos)
2. Ensure at some environmental/top layer that the advertised token lifetime 
exceeds the timeout set on the request, before making the request.  This 
implies (since there's no special handling in place) failing if the token was 
expired earlier than expected.


The primary distinction being that in (2) the client is ignorant of how to 
create tokens, and just assumes they're valid.

(2) is particularly easy to code for simple "one shot" command line clients.  
For a persistent client, the easiest approach is probably to have an 
asynchronous loop that just keeps refreshing the stored token whenever it 
approaches "expiry - max_single_request_timeout".

My concern with (1) is that it involves passing username/password all the way 
down to the bottom layers - see the heat example where this means crossing 
into another program/service.  Moreover, if the token was expired earlier than 
advertised it probably means the admin has deliberately rejected the user or 
something and the intent is that they _should_ be locked out - it would be 
unfortunate to have a synchronised retry attack on keystone from all the 
rejected clients at that point :/

-- 
 - Gus

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-10 Thread Jamie Lennox

- Original Message -
> From: "Steven Hardy" 
> To: "OpenStack Development Mailing List (not for usage questions)" 
> 
> Sent: Thursday, September 11, 2014 1:55:49 AM
> Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
> tokens leads to overall OpenStack fragility
> 
> On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
> > Going through the untriaged Nova bugs, and there are a few on a similar
> > pattern:
> > 
> > Nova operation in progress takes a while
> > Crosses keystone token expiration time
> > Timeout thrown
> > Operation fails
> > Terrible 500 error sent back to user
> 
> We actually have this exact problem in Heat, which I'm currently trying to
> solve:
> 
> https://bugs.launchpad.net/heat/+bug/1306294
> 
> Can you clarify, is the issue either:
> 
> 1. Create novaclient object with username/password
> 2. Do series of operations via the client object which eventually fail
> after $n operations due to token expiry
> 
> or:
> 
> 1. Create novaclient object with username/password
> 2. Some really long operation which means token expires in the course of
> the service handling the request, blowing up and 500-ing
> 
> If the former, then it does sound like a client, or usage-of-client bug,
> although note if you pass a *token* vs username/password (as is currently
> done for glance and heat in tempest, because we lack the code to get the
> token outside of the shell.py code..), there's nothing the client can do,
> because you can't request a new token with longer expiry with a token...
> 
> However if the latter, then it seems like not really a client problem to
> solve, as it's hard to know what action to take if a request failed
> part-way through and thus things are in an unknown state.
> 
> This issue is a hard problem, which can possibly be solved by
> switching to a trust scoped token (service impersonates the user), but then
> you're effectively bypassing token expiry via delegation which sits
> uncomfortably with me (despite the fact that we may have to do this in heat
> to solve the afforementioned bug)
> 
> > It seems like we should have a standard pattern that on token expiration
> > the underlying code at least gives one retry to try to establish a new
> > token to complete the flow, however as far as I can tell *no* clients do
> > this.
> 
> As has been mentioned, using sessions may be one solution to this, and
> AFAIK session support (where it doesn't already exist) is getting into
> various clients via the work being carried out to add support for v3
> keystone by David Hu:
> 
> https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
> 
> I see patches for Heat (currently gating), Nova and Ironic.
> 
> > I know we had to add that into Tempest because tempest runs can exceed 1
> > hr, and we want to avoid random fails just because we cross a token
> > expiration boundary.
> 
> I can't claim great experience with sessions yet, but AIUI you could do
> something like:
> 
> from keystoneclient.auth.identity import v3
> from keystoneclient import session
> from keystoneclient.v3 import client
> 
> auth = v3.Password(auth_url=OS_AUTH_URL,
>username=USERNAME,
>password=PASSWORD,
>project_id=PROJECT,
>user_domain_name='default')
> sess = session.Session(auth=auth)
> ks = client.Client(session=sess)
> 
> And if you can pass the same session into the various clients tempest
> creates then the Password auth-plugin code takes care of reauthenticating
> if the token cached in the auth plugin object is expired, or nearly
> expired:
> 
> https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120
> 
> So in the tempest case, it seems like it may be a case of migrating the
> code creating the clients to use sessions instead of passing a token or
> username/password into the client object?
> 
> That's my understanding of it atm anyway, hopefully jamielennox will be along
> soon with more details :)
> 
> Steve


By clients here are you referring to the CLIs or the python libraries? 
Implementation is at different points with each. 

Sessions will handle automatically reauthenticating and retrying a request, 
however it relies on the service throwing a 401 Unauthenticated error. If a 
service is returning a 500 (or a timeout?) then there isn't much that a client 
can/should do for that because we can't assume that trying again with a new 
token will solve anything. 

At the moment we have keystoneclient, novaclient, cinde

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-10 Thread Steven Hardy
On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
> Going through the untriaged Nova bugs, and there are a few on a similar
> pattern:
> 
> Nova operation in progress takes a while
> Crosses keystone token expiration time
> Timeout thrown
> Operation fails
> Terrible 500 error sent back to user

We actually have this exact problem in Heat, which I'm currently trying to
solve:

https://bugs.launchpad.net/heat/+bug/1306294

Can you clarify, is the issue either:

1. Create novaclient object with username/password
2. Do series of operations via the client object which eventually fail
after $n operations due to token expiry

or:

1. Create novaclient object with username/password
2. Some really long operation which means token expires in the course of
the service handling the request, blowing up and 500-ing

If the former, then it does sound like a client, or usage-of-client bug,
although note if you pass a *token* vs username/password (as is currently
done for glance and heat in tempest, because we lack the code to get the
token outside of the shell.py code..), there's nothing the client can do,
because you can't request a new token with longer expiry with a token...

However if the latter, then it seems like not really a client problem to
solve, as it's hard to know what action to take if a request failed
part-way through and thus things are in an unknown state.

This issue is a hard problem, which can possibly be solved by
switching to a trust scoped token (service impersonates the user), but then
you're effectively bypassing token expiry via delegation which sits
uncomfortably with me (despite the fact that we may have to do this in heat
to solve the afforementioned bug)

> It seems like we should have a standard pattern that on token expiration
> the underlying code at least gives one retry to try to establish a new
> token to complete the flow, however as far as I can tell *no* clients do
> this.

As has been mentioned, using sessions may be one solution to this, and
AFAIK session support (where it doesn't already exist) is getting into
various clients via the work being carried out to add support for v3
keystone by David Hu:

https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z

I see patches for Heat (currently gating), Nova and Ironic.

> I know we had to add that into Tempest because tempest runs can exceed 1
> hr, and we want to avoid random fails just because we cross a token
> expiration boundary.

I can't claim great experience with sessions yet, but AIUI you could do
something like:

from keystoneclient.auth.identity import v3
from keystoneclient import session
from keystoneclient.v3 import client

auth = v3.Password(auth_url=OS_AUTH_URL,
   username=USERNAME,
   password=PASSWORD,
   project_id=PROJECT,
   user_domain_name='default')
sess = session.Session(auth=auth)
ks = client.Client(session=sess)

And if you can pass the same session into the various clients tempest
creates then the Password auth-plugin code takes care of reauthenticating
if the token cached in the auth plugin object is expired, or nearly
expired:

https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120

So in the tempest case, it seems like it may be a case of migrating the
code creating the clients to use sessions instead of passing a token or
username/password into the client object?

That's my understanding of it atm anyway, hopefully jamielennox will be along
soon with more details :)

Steve

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-10 Thread Sean Dague
Do we know which versions of the clients do that?

-Sean

On 09/10/2014 10:22 AM, Endre Karlson wrote:
> I think at least clients supporting keystone sessions that are
> configured to use the auth.Password mech supports this since re-auth is
> done by the session rather then the service client itself.
> 
> 2014-09-10 16:14 GMT+02:00 Sean Dague  >:
> 
> Going through the untriaged Nova bugs, and there are a few on a similar
> pattern:
> 
> Nova operation in progress takes a while
> Crosses keystone token expiration time
> Timeout thrown
> Operation fails
> Terrible 500 error sent back to user
> 
> It seems like we should have a standard pattern that on token expiration
> the underlying code at least gives one retry to try to establish a new
> token to complete the flow, however as far as I can tell *no* clients do
> this.
> 
> I know we had to add that into Tempest because tempest runs can exceed 1
> hr, and we want to avoid random fails just because we cross a token
> expiration boundary.
> 
> Anyone closer to the clients that can comment here?
> 
> -Sean
> 
> --
> Sean Dague
> http://dague.net
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-10 Thread Endre Karlson
I think at least clients supporting keystone sessions that are configured
to use the auth.Password mech supports this since re-auth is done by the
session rather then the service client itself.

2014-09-10 16:14 GMT+02:00 Sean Dague :

> Going through the untriaged Nova bugs, and there are a few on a similar
> pattern:
>
> Nova operation in progress takes a while
> Crosses keystone token expiration time
> Timeout thrown
> Operation fails
> Terrible 500 error sent back to user
>
> It seems like we should have a standard pattern that on token expiration
> the underlying code at least gives one retry to try to establish a new
> token to complete the flow, however as far as I can tell *no* clients do
> this.
>
> I know we had to add that into Tempest because tempest runs can exceed 1
> hr, and we want to avoid random fails just because we cross a token
> expiration boundary.
>
> Anyone closer to the clients that can comment here?
>
> -Sean
>
> --
> Sean Dague
> http://dague.net
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-10 Thread Sean Dague
Going through the untriaged Nova bugs, and there are a few on a similar
pattern:

Nova operation in progress takes a while
Crosses keystone token expiration time
Timeout thrown
Operation fails
Terrible 500 error sent back to user

It seems like we should have a standard pattern that on token expiration
the underlying code at least gives one retry to try to establish a new
token to complete the flow, however as far as I can tell *no* clients do
this.

I know we had to add that into Tempest because tempest runs can exceed 1
hr, and we want to avoid random fails just because we cross a token
expiration boundary.

Anyone closer to the clients that can comment here?

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev