Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-17 Thread Duncan Thomas
On 16 September 2014 01:28, Nathan Kinder nkin...@redhat.com wrote:
 The idea would be to leave normal tokens with a smaller validity period
 (like the current default of an hour), but also allow one-time use
 tokens to be requested.

Cinder backup makes many requests to swift during a backup, one per
chunk to be uploaded plus one or more for the metadata file.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-17 Thread Angus Lees
On Wed, 17 Sep 2014 04:53:28 PM Duncan Thomas wrote:
 On 16 September 2014 01:28, Nathan Kinder nkin...@redhat.com wrote:
  The idea would be to leave normal tokens with a smaller validity period
  (like the current default of an hour), but also allow one-time use
  tokens to be requested.
 
 Cinder backup makes many requests to swift during a backup, one per
 chunk to be uploaded plus one or more for the metadata file.

Right, and what if the HTTP connection times out and needs to be retried. Can 
I reuse my single use token?

Also: single-use tokens scale badly since they need a strongly consistent 
validation point that in normal use requires frequent writes.

-- 
 - Gus

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-16 Thread Adam Young

On 09/15/2014 08:28 PM, Nathan Kinder wrote:


On 09/12/2014 12:46 AM, Angus Lees wrote:

On Thu, 11 Sep 2014 03:21:52 PM Steven Hardy wrote:

On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:

For service to service communication there are two types.
1) using the user's token like nova-cinder. If this token expires there
is really nothing that nova can do except raise 401 and make the client
do it again. 2) using a service user like nova-neutron. This should
allow automatic reauthentication and will be fixed/standardied by
sessions.

(1) is the problem I'm trying to solve in bug #1306294, and (for Heat at
least) there seems to be two solutions, neither of which I particularly
like:

- Require username/password to be passed into the service (something we've
   been trying to banish via migrating to trusts for deferred
   authentication)
- Create a trust, and impersonate the user for the duration of the request,
   or after the token expires until it is completed, using the service user
   credentials and the trust_id.

It's the second one which I'm deliberating over - technically it will work,
and we create the trust anyway (e.g for later use to do autoscaling etc),
but can anyone from the keystone team comment on the legitimacy of the
approach?

Intuitively it seems wrong, but I can't see any other way if we want to
support token-only auth and cope with folks doing stuff which takes 2 hours
with a 1 hour token expiry?

A possible 3rd option is some sort of longer lived, but limited scope
capability token.

The user would create a capability token that represents anyone possessing
this token is (eg) allowed to write to swift as $user.  The token could be
created by keystone as a trusted 3rd party or by swift (doesn't matter which),
in response to a request authenticated as $user.  The client then includes
that token in the request *to cinder*, so cinder can pass it back to swift
when doing the writes.
This capability token would be of much longer duration (long enough to
complete the cinder-swift task), which is ok because it is of a much more
limited scope (ideally as fine grained as we can bother implementing).

With UUID tokens, it would even be possible to implement a one-time
use sort of token.  Since Keystone needs to be asked to validate a UUID
token, the token could be invalidated by Keystone after the first
verification.  Since the token is limited based off of number of times
of usage, there should be less concerns about a long validity period
(though it would make sense to use something sane still).  This approach
wouldn't be possible with PKI tokens since Keystone is not in the
validation path.

Your idea of passing the capability token in the request would work
well with this, as the token only needs to be extracted and used once
instead of being passed from service to service and validated at each
hop (usercinder-swift in your example).

The idea would be to leave normal tokens with a smaller validity period
(like the current default of an hour), but also allow one-time use
tokens to be requested.


It is dumb to make service get a token just to hand the token back to 
Keystone.


Guang Yee has pushed for years to get a capability into Keystone where 
certain API calls did not require a token, but would instead the 
permission would be based on whatever the users capabilites were at the 
time.


The problem is that Admin in the default policy (and hardcoded in V2) 
is definded to mean User has the role admin on anything  which is, of 
course, suboptimal (to say the least).


So validating a Token should not require a token.  We could add to the 
request some standard Stanza for saying Here is the project/domain that 
I want to do this with  so that we can atleast Keep Keystone's current 
behavior somewhat sane








(I like this option)


A 4th option is to have much longer lived tokens everywhere (long enough for
this backup), but the user is able to expire it early via keystone whenever
they feel it might be compromised (aiui this is exactly how things work now -
we just need to increase the timeout).  Greater exposure to replay attacks,
but if detected they can still be invalidated quickly.

(This is the easiest option, it's basically just formalising what the
operators are already doing)


A 5th option (wow) is to have the end user/client repeatedly push in fresh
tokens during long-running operations (and heat is the uber-example since it
basically wants to impersonate the user forever).  Those tokens would then
need to be refreshed all the way down the stack for any outstanding operations
that might need the new token.

(This or the 4th option seems ugly but unavoidable for forever services like
heat.  There has to be some way to invalidate their access if they go rogue,
either by time (and thus needs a refresh mechanism) or by invalidation-via-
keystone (which implies the token lasts forever unless invalidated))

I think Keystone trusts are better for forever 

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-15 Thread Brant Knudson
On Wed, Sep 10, 2014 at 9:14 AM, Sean Dague s...@dague.net wrote:

 Going through the untriaged Nova bugs, and there are a few on a similar
 pattern:

 Nova operation in progress takes a while
 Crosses keystone token expiration time
 Timeout thrown
 Operation fails
 Terrible 500 error sent back to user

 It seems like we should have a standard pattern that on token expiration
 the underlying code at least gives one retry to try to establish a new
 token to complete the flow, however as far as I can tell *no* clients do
 this.

 I know we had to add that into Tempest because tempest runs can exceed 1
 hr, and we want to avoid random fails just because we cross a token
 expiration boundary.

 Anyone closer to the clients that can comment here?

 -Sean


Currently, a service with a token can't always refresh a new token, because
the service doesn't always have the user's credentials (which is good...
the service shouldn't have the user's credentials), and even if the
credentials were available the service might not be able to use them to
authenticate (not all authentication is done using username and password).

The most obvious solution to me is to have the identity server provides an
api where, given a token, you can get a new token with an expiration time
of your choice. Use of the API would be limited to service users. When a
service gets a token that it wants to send on to another service it first
uses the existing token to get a new token with whatever expiration time it
thinks would be adequate. If the service knows that it's done with the
token it will hopefully revoke the new token to keep the token database
clean.

The only thing missing from the existing auth API for getting a token from
a token is being able to set the expiration time --
https://github.com/openstack/identity-api/blob/master/v3/src/markdown/identity-api-v3.md#authentication-authentication
. Keystone will also have to be enhanced to validate that if the
token-from-token request has a new expiration time the requestor has the
required role.

- Brant
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-15 Thread Nathan Kinder


On 09/12/2014 12:46 AM, Angus Lees wrote:
 On Thu, 11 Sep 2014 03:21:52 PM Steven Hardy wrote:
 On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:
 For service to service communication there are two types.
 1) using the user's token like nova-cinder. If this token expires there
 is really nothing that nova can do except raise 401 and make the client
 do it again. 2) using a service user like nova-neutron. This should
 allow automatic reauthentication and will be fixed/standardied by
 sessions.
 (1) is the problem I'm trying to solve in bug #1306294, and (for Heat at
 least) there seems to be two solutions, neither of which I particularly
 like:

 - Require username/password to be passed into the service (something we've
   been trying to banish via migrating to trusts for deferred
   authentication)
 - Create a trust, and impersonate the user for the duration of the request,
   or after the token expires until it is completed, using the service user
   credentials and the trust_id.

 It's the second one which I'm deliberating over - technically it will work,
 and we create the trust anyway (e.g for later use to do autoscaling etc),
 but can anyone from the keystone team comment on the legitimacy of the
 approach?

 Intuitively it seems wrong, but I can't see any other way if we want to
 support token-only auth and cope with folks doing stuff which takes 2 hours
 with a 1 hour token expiry?
 
 A possible 3rd option is some sort of longer lived, but limited scope 
 capability token.
 
 The user would create a capability token that represents anyone possessing 
 this token is (eg) allowed to write to swift as $user.  The token could be 
 created by keystone as a trusted 3rd party or by swift (doesn't matter 
 which), 
 in response to a request authenticated as $user.  The client then includes 
 that token in the request *to cinder*, so cinder can pass it back to swift 
 when doing the writes.
 This capability token would be of much longer duration (long enough to 
 complete the cinder-swift task), which is ok because it is of a much more 
 limited scope (ideally as fine grained as we can bother implementing).

With UUID tokens, it would even be possible to implement a one-time
use sort of token.  Since Keystone needs to be asked to validate a UUID
token, the token could be invalidated by Keystone after the first
verification.  Since the token is limited based off of number of times
of usage, there should be less concerns about a long validity period
(though it would make sense to use something sane still).  This approach
wouldn't be possible with PKI tokens since Keystone is not in the
validation path.

Your idea of passing the capability token in the request would work
well with this, as the token only needs to be extracted and used once
instead of being passed from service to service and validated at each
hop (usercinder-swift in your example).

The idea would be to leave normal tokens with a smaller validity period
(like the current default of an hour), but also allow one-time use
tokens to be requested.

 
 (I like this option)
 
 
 A 4th option is to have much longer lived tokens everywhere (long enough for 
 this backup), but the user is able to expire it early via keystone whenever 
 they feel it might be compromised (aiui this is exactly how things work now - 
 we just need to increase the timeout).  Greater exposure to replay attacks, 
 but if detected they can still be invalidated quickly.
 
 (This is the easiest option, it's basically just formalising what the 
 operators are already doing)
 
 
 A 5th option (wow) is to have the end user/client repeatedly push in fresh 
 tokens during long-running operations (and heat is the uber-example since it 
 basically wants to impersonate the user forever).  Those tokens would then 
 need to be refreshed all the way down the stack for any outstanding 
 operations 
 that might need the new token.
 
 (This or the 4th option seems ugly but unavoidable for forever services 
 like 
 heat.  There has to be some way to invalidate their access if they go rogue, 
 either by time (and thus needs a refresh mechanism) or by invalidation-via-
 keystone (which implies the token lasts forever unless invalidated))

I think Keystone trusts are better for forever services, though I see
no reason why a trust token also couldn't have a limited number of uses
with a longer validity period.  The trust itself doesn't need an
expiration, so the trust can be executed at some future point in time to
get a limited use trust token.

 
 
 However we do it:  the permission to do the action should come from the 
 original user - and this is expressed as tokens coming from the original 
 client/user in some form.   By allowing services to create something without 
 the original client/user being involved, we're really just bypassing the 
 token 
 authentication mechanism (and there are easier ways to ignore the token ;)

Yeah, this is ugly.  You give up any control you have 

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-12 Thread Angus Lees
On Thu, 11 Sep 2014 03:00:02 PM Duncan Thomas wrote:
 On 11 September 2014 03:17, Angus Lees g...@inodes.org wrote:
  (As inspired by eg kerberos)
  2. Ensure at some environmental/top layer that the advertised token
  lifetime exceeds the timeout set on the request, before making the
  request.  This implies (since there's no special handling in place)
  failing if the token was expired earlier than expected.
 
 We've a related problem in cinder (cinder-backup uses the user's token
 to talk to swift, and the backup can easily take longer than the token
 expiry time) which could not be solved by this, since the time the
 backup takes is unknown (compression, service and resource contention,
 etc alter the time by multiple orders of magnitude)

Yes, this sounds like another example of the cross-service problem I was 
describing with refreshing the token at the bottom layer - but I disagree that 
this is handled any better by refreshing tokens on-demand at the bottom layer.

In order to have cinder refresh the token while talking to swift, it needs to 
know the user's password (ouch - why even have the token) or have magic token 
creating powers (in which case none of this matters, because cinder can just 
create tokens any time it wants).

As far as I can see, we either need to be able to 1) generate tokens that _do_ 
last long enough, 2) pass user+password to cinder so it is capable of 
creating new tokens as necessary, or 3) only perform token-based auth once at 
the start of a long cinder-glance workflow like this, and then use some sort 
of limited-scope-but-unlimited-time session token for follow-on requests.

I think I'm advocating for (1) or (3), and (2) as a distant third.


... Unless there's some other option here?  Your dismissal above sounded like 
there was already a solution for this - what's the current solution?

-- 
 - Gus

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-12 Thread Angus Lees
On Thu, 11 Sep 2014 03:21:52 PM Steven Hardy wrote:
 On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:
  For service to service communication there are two types.
  1) using the user's token like nova-cinder. If this token expires there
  is really nothing that nova can do except raise 401 and make the client
  do it again. 2) using a service user like nova-neutron. This should
  allow automatic reauthentication and will be fixed/standardied by
  sessions.
 (1) is the problem I'm trying to solve in bug #1306294, and (for Heat at
 least) there seems to be two solutions, neither of which I particularly
 like:
 
 - Require username/password to be passed into the service (something we've
   been trying to banish via migrating to trusts for deferred
   authentication)
 - Create a trust, and impersonate the user for the duration of the request,
   or after the token expires until it is completed, using the service user
   credentials and the trust_id.
 
 It's the second one which I'm deliberating over - technically it will work,
 and we create the trust anyway (e.g for later use to do autoscaling etc),
 but can anyone from the keystone team comment on the legitimacy of the
 approach?
 
 Intuitively it seems wrong, but I can't see any other way if we want to
 support token-only auth and cope with folks doing stuff which takes 2 hours
 with a 1 hour token expiry?

A possible 3rd option is some sort of longer lived, but limited scope 
capability token.

The user would create a capability token that represents anyone possessing 
this token is (eg) allowed to write to swift as $user.  The token could be 
created by keystone as a trusted 3rd party or by swift (doesn't matter which), 
in response to a request authenticated as $user.  The client then includes 
that token in the request *to cinder*, so cinder can pass it back to swift 
when doing the writes.
This capability token would be of much longer duration (long enough to 
complete the cinder-swift task), which is ok because it is of a much more 
limited scope (ideally as fine grained as we can bother implementing).

(I like this option)


A 4th option is to have much longer lived tokens everywhere (long enough for 
this backup), but the user is able to expire it early via keystone whenever 
they feel it might be compromised (aiui this is exactly how things work now - 
we just need to increase the timeout).  Greater exposure to replay attacks, 
but if detected they can still be invalidated quickly.

(This is the easiest option, it's basically just formalising what the 
operators are already doing)


A 5th option (wow) is to have the end user/client repeatedly push in fresh 
tokens during long-running operations (and heat is the uber-example since it 
basically wants to impersonate the user forever).  Those tokens would then 
need to be refreshed all the way down the stack for any outstanding operations 
that might need the new token.

(This or the 4th option seems ugly but unavoidable for forever services like 
heat.  There has to be some way to invalidate their access if they go rogue, 
either by time (and thus needs a refresh mechanism) or by invalidation-via-
keystone (which implies the token lasts forever unless invalidated))


However we do it:  the permission to do the action should come from the 
original user - and this is expressed as tokens coming from the original 
client/user in some form.   By allowing services to create something without 
the original client/user being involved, we're really just bypassing the token 
authentication mechanism (and there are easier ways to ignore the token ;)

-- 
 - Gus

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-12 Thread Flavio Percoco
On 09/11/2014 01:44 PM, Sean Dague wrote:
 On 09/10/2014 08:46 PM, Jamie Lennox wrote:

 - Original Message -
 From: Steven Hardy sha...@redhat.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Thursday, September 11, 2014 1:55:49 AM
 Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
 tokens leads to overall OpenStack fragility

 On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
 Going through the untriaged Nova bugs, and there are a few on a similar
 pattern:

 Nova operation in progress takes a while
 Crosses keystone token expiration time
 Timeout thrown
 Operation fails
 Terrible 500 error sent back to user

 We actually have this exact problem in Heat, which I'm currently trying to
 solve:

 https://bugs.launchpad.net/heat/+bug/1306294

 Can you clarify, is the issue either:

 1. Create novaclient object with username/password
 2. Do series of operations via the client object which eventually fail
 after $n operations due to token expiry

 or:

 1. Create novaclient object with username/password
 2. Some really long operation which means token expires in the course of
 the service handling the request, blowing up and 500-ing

 If the former, then it does sound like a client, or usage-of-client bug,
 although note if you pass a *token* vs username/password (as is currently
 done for glance and heat in tempest, because we lack the code to get the
 token outside of the shell.py code..), there's nothing the client can do,
 because you can't request a new token with longer expiry with a token...

 However if the latter, then it seems like not really a client problem to
 solve, as it's hard to know what action to take if a request failed
 part-way through and thus things are in an unknown state.

 This issue is a hard problem, which can possibly be solved by
 switching to a trust scoped token (service impersonates the user), but then
 you're effectively bypassing token expiry via delegation which sits
 uncomfortably with me (despite the fact that we may have to do this in heat
 to solve the afforementioned bug)

 It seems like we should have a standard pattern that on token expiration
 the underlying code at least gives one retry to try to establish a new
 token to complete the flow, however as far as I can tell *no* clients do
 this.

 As has been mentioned, using sessions may be one solution to this, and
 AFAIK session support (where it doesn't already exist) is getting into
 various clients via the work being carried out to add support for v3
 keystone by David Hu:

 https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z

 I see patches for Heat (currently gating), Nova and Ironic.

 I know we had to add that into Tempest because tempest runs can exceed 1
 hr, and we want to avoid random fails just because we cross a token
 expiration boundary.

 I can't claim great experience with sessions yet, but AIUI you could do
 something like:

 from keystoneclient.auth.identity import v3
 from keystoneclient import session
 from keystoneclient.v3 import client

 auth = v3.Password(auth_url=OS_AUTH_URL,
username=USERNAME,
password=PASSWORD,
project_id=PROJECT,
user_domain_name='default')
 sess = session.Session(auth=auth)
 ks = client.Client(session=sess)

 And if you can pass the same session into the various clients tempest
 creates then the Password auth-plugin code takes care of reauthenticating
 if the token cached in the auth plugin object is expired, or nearly
 expired:

 https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120

 So in the tempest case, it seems like it may be a case of migrating the
 code creating the clients to use sessions instead of passing a token or
 username/password into the client object?

 That's my understanding of it atm anyway, hopefully jamielennox will be 
 along
 soon with more details :)

 Steve


 By clients here are you referring to the CLIs or the python libraries? 
 Implementation is at different points with each. 

 Sessions will handle automatically reauthenticating and retrying a request, 
 however it relies on the service throwing a 401 Unauthenticated error. If a 
 service is returning a 500 (or a timeout?) then there isn't much that a 
 client can/should do for that because we can't assume that trying again with 
 a new token will solve anything. 

 At the moment we have keystoneclient, novaclient, cinderclient neutronclient 
 and then a number of the smaller projects with support for sessions. That 
 obviously doesn't mean that existing users of that code have transitioned to 
 the newer way though. David Hu has been working on using this code within 
 the existing CLIs. I have prototypes for at least nova to talk to neutron 
 and cinder which i'm waiting for Kilo to push. From there it should be 
 easier to do

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-12 Thread Steven Hardy
On Thu, Sep 11, 2014 at 08:43:22PM -0400, Jamie Lennox wrote:
 
 
 - Original Message -
  From: Steven Hardy sha...@redhat.com
  To: OpenStack Development Mailing List (not for usage questions) 
  openstack-dev@lists.openstack.org
  Sent: Friday, 12 September, 2014 12:21:52 AM
  Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
  tokens leads to overall OpenStack fragility
  
  On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:
   
   - Original Message -
From: Steven Hardy sha...@redhat.com
To: OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
Sent: Thursday, September 11, 2014 1:55:49 AM
Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying
tokens leads to overall OpenStack fragility

On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
 Going through the untriaged Nova bugs, and there are a few on a 
 similar
 pattern:
 
 Nova operation in progress takes a while
 Crosses keystone token expiration time
 Timeout thrown
 Operation fails
 Terrible 500 error sent back to user

We actually have this exact problem in Heat, which I'm currently trying
to
solve:

https://bugs.launchpad.net/heat/+bug/1306294

Can you clarify, is the issue either:

1. Create novaclient object with username/password
2. Do series of operations via the client object which eventually fail
after $n operations due to token expiry

or:

1. Create novaclient object with username/password
2. Some really long operation which means token expires in the course of
the service handling the request, blowing up and 500-ing

If the former, then it does sound like a client, or usage-of-client bug,
although note if you pass a *token* vs username/password (as is 
currently
done for glance and heat in tempest, because we lack the code to get the
token outside of the shell.py code..), there's nothing the client can 
do,
because you can't request a new token with longer expiry with a token...

However if the latter, then it seems like not really a client problem to
solve, as it's hard to know what action to take if a request failed
part-way through and thus things are in an unknown state.

This issue is a hard problem, which can possibly be solved by
switching to a trust scoped token (service impersonates the user), but
then
you're effectively bypassing token expiry via delegation which sits
uncomfortably with me (despite the fact that we may have to do this in
heat
to solve the afforementioned bug)

 It seems like we should have a standard pattern that on token
 expiration
 the underlying code at least gives one retry to try to establish a new
 token to complete the flow, however as far as I can tell *no* clients
 do
 this.

As has been mentioned, using sessions may be one solution to this, and
AFAIK session support (where it doesn't already exist) is getting into
various clients via the work being carried out to add support for v3
keystone by David Hu:

https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z

I see patches for Heat (currently gating), Nova and Ironic.

 I know we had to add that into Tempest because tempest runs can exceed
 1
 hr, and we want to avoid random fails just because we cross a token
 expiration boundary.

I can't claim great experience with sessions yet, but AIUI you could do
something like:

from keystoneclient.auth.identity import v3
from keystoneclient import session
from keystoneclient.v3 import client

auth = v3.Password(auth_url=OS_AUTH_URL,
   username=USERNAME,
   password=PASSWORD,
   project_id=PROJECT,
   user_domain_name='default')
sess = session.Session(auth=auth)
ks = client.Client(session=sess)

And if you can pass the same session into the various clients tempest
creates then the Password auth-plugin code takes care of 
reauthenticating
if the token cached in the auth plugin object is expired, or nearly
expired:

https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120

So in the tempest case, it seems like it may be a case of migrating the
code creating the clients to use sessions instead of passing a token or
username/password into the client object?

That's my understanding of it atm anyway, hopefully jamielennox will be
along
soon with more details :)

Steve
   
   
   By clients here are you referring to the CLIs or the python libraries?
   Implementation is at different points with each.
  
  I think for both heat and tempest we're

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-11 Thread Sean Dague
On 09/10/2014 08:46 PM, Jamie Lennox wrote:
 
 - Original Message -
 From: Steven Hardy sha...@redhat.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Thursday, September 11, 2014 1:55:49 AM
 Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
 tokens leads to overall OpenStack fragility

 On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
 Going through the untriaged Nova bugs, and there are a few on a similar
 pattern:

 Nova operation in progress takes a while
 Crosses keystone token expiration time
 Timeout thrown
 Operation fails
 Terrible 500 error sent back to user

 We actually have this exact problem in Heat, which I'm currently trying to
 solve:

 https://bugs.launchpad.net/heat/+bug/1306294

 Can you clarify, is the issue either:

 1. Create novaclient object with username/password
 2. Do series of operations via the client object which eventually fail
 after $n operations due to token expiry

 or:

 1. Create novaclient object with username/password
 2. Some really long operation which means token expires in the course of
 the service handling the request, blowing up and 500-ing

 If the former, then it does sound like a client, or usage-of-client bug,
 although note if you pass a *token* vs username/password (as is currently
 done for glance and heat in tempest, because we lack the code to get the
 token outside of the shell.py code..), there's nothing the client can do,
 because you can't request a new token with longer expiry with a token...

 However if the latter, then it seems like not really a client problem to
 solve, as it's hard to know what action to take if a request failed
 part-way through and thus things are in an unknown state.

 This issue is a hard problem, which can possibly be solved by
 switching to a trust scoped token (service impersonates the user), but then
 you're effectively bypassing token expiry via delegation which sits
 uncomfortably with me (despite the fact that we may have to do this in heat
 to solve the afforementioned bug)

 It seems like we should have a standard pattern that on token expiration
 the underlying code at least gives one retry to try to establish a new
 token to complete the flow, however as far as I can tell *no* clients do
 this.

 As has been mentioned, using sessions may be one solution to this, and
 AFAIK session support (where it doesn't already exist) is getting into
 various clients via the work being carried out to add support for v3
 keystone by David Hu:

 https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z

 I see patches for Heat (currently gating), Nova and Ironic.

 I know we had to add that into Tempest because tempest runs can exceed 1
 hr, and we want to avoid random fails just because we cross a token
 expiration boundary.

 I can't claim great experience with sessions yet, but AIUI you could do
 something like:

 from keystoneclient.auth.identity import v3
 from keystoneclient import session
 from keystoneclient.v3 import client

 auth = v3.Password(auth_url=OS_AUTH_URL,
username=USERNAME,
password=PASSWORD,
project_id=PROJECT,
user_domain_name='default')
 sess = session.Session(auth=auth)
 ks = client.Client(session=sess)

 And if you can pass the same session into the various clients tempest
 creates then the Password auth-plugin code takes care of reauthenticating
 if the token cached in the auth plugin object is expired, or nearly
 expired:

 https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120

 So in the tempest case, it seems like it may be a case of migrating the
 code creating the clients to use sessions instead of passing a token or
 username/password into the client object?

 That's my understanding of it atm anyway, hopefully jamielennox will be along
 soon with more details :)

 Steve
 
 
 By clients here are you referring to the CLIs or the python libraries? 
 Implementation is at different points with each. 
 
 Sessions will handle automatically reauthenticating and retrying a request, 
 however it relies on the service throwing a 401 Unauthenticated error. If a 
 service is returning a 500 (or a timeout?) then there isn't much that a 
 client can/should do for that because we can't assume that trying again with 
 a new token will solve anything. 
 
 At the moment we have keystoneclient, novaclient, cinderclient neutronclient 
 and then a number of the smaller projects with support for sessions. That 
 obviously doesn't mean that existing users of that code have transitioned to 
 the newer way though. David Hu has been working on using this code within the 
 existing CLIs. I have prototypes for at least nova to talk to neutron and 
 cinder which i'm waiting for Kilo to push. From there it should be easier to 
 do this for other services. 
 
 For service

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-11 Thread Sean Dague
On 09/10/2014 11:55 AM, Steven Hardy wrote:
 On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
 Going through the untriaged Nova bugs, and there are a few on a similar
 pattern:

 Nova operation in progress takes a while
 Crosses keystone token expiration time
 Timeout thrown
 Operation fails
 Terrible 500 error sent back to user
 
 We actually have this exact problem in Heat, which I'm currently trying to
 solve:
 
 https://bugs.launchpad.net/heat/+bug/1306294
 
 Can you clarify, is the issue either:
 
 1. Create novaclient object with username/password
 2. Do series of operations via the client object which eventually fail
 after $n operations due to token expiry
 
 or:
 
 1. Create novaclient object with username/password
 2. Some really long operation which means token expires in the course of
 the service handling the request, blowing up and 500-ing

From what I can tell of the Nova bugs both are issues. Honestly, it
would probably be really telling to setup a test env with 10s token
timeouts and see how crazy it broke. I expect that our expiration logic,
and how our components react to it, is actually a lot less coherent than
we believe.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-11 Thread Duncan Thomas
On 11 September 2014 03:17, Angus Lees g...@inodes.org wrote:

 (As inspired by eg kerberos)
 2. Ensure at some environmental/top layer that the advertised token lifetime
 exceeds the timeout set on the request, before making the request.  This
 implies (since there's no special handling in place) failing if the token was
 expired earlier than expected.

We've a related problem in cinder (cinder-backup uses the user's token
to talk to swift, and the backup can easily take longer than the token
expiry time) which could not be solved by this, since the time the
backup takes is unknown (compression, service and resource contention,
etc alter the time by multiple orders of magnitude)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-11 Thread Steven Hardy
On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:
 
 - Original Message -
  From: Steven Hardy sha...@redhat.com
  To: OpenStack Development Mailing List (not for usage questions) 
  openstack-dev@lists.openstack.org
  Sent: Thursday, September 11, 2014 1:55:49 AM
  Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
  tokens leads to overall OpenStack fragility
  
  On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
   Going through the untriaged Nova bugs, and there are a few on a similar
   pattern:
   
   Nova operation in progress takes a while
   Crosses keystone token expiration time
   Timeout thrown
   Operation fails
   Terrible 500 error sent back to user
  
  We actually have this exact problem in Heat, which I'm currently trying to
  solve:
  
  https://bugs.launchpad.net/heat/+bug/1306294
  
  Can you clarify, is the issue either:
  
  1. Create novaclient object with username/password
  2. Do series of operations via the client object which eventually fail
  after $n operations due to token expiry
  
  or:
  
  1. Create novaclient object with username/password
  2. Some really long operation which means token expires in the course of
  the service handling the request, blowing up and 500-ing
  
  If the former, then it does sound like a client, or usage-of-client bug,
  although note if you pass a *token* vs username/password (as is currently
  done for glance and heat in tempest, because we lack the code to get the
  token outside of the shell.py code..), there's nothing the client can do,
  because you can't request a new token with longer expiry with a token...
  
  However if the latter, then it seems like not really a client problem to
  solve, as it's hard to know what action to take if a request failed
  part-way through and thus things are in an unknown state.
  
  This issue is a hard problem, which can possibly be solved by
  switching to a trust scoped token (service impersonates the user), but then
  you're effectively bypassing token expiry via delegation which sits
  uncomfortably with me (despite the fact that we may have to do this in heat
  to solve the afforementioned bug)
  
   It seems like we should have a standard pattern that on token expiration
   the underlying code at least gives one retry to try to establish a new
   token to complete the flow, however as far as I can tell *no* clients do
   this.
  
  As has been mentioned, using sessions may be one solution to this, and
  AFAIK session support (where it doesn't already exist) is getting into
  various clients via the work being carried out to add support for v3
  keystone by David Hu:
  
  https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
  
  I see patches for Heat (currently gating), Nova and Ironic.
  
   I know we had to add that into Tempest because tempest runs can exceed 1
   hr, and we want to avoid random fails just because we cross a token
   expiration boundary.
  
  I can't claim great experience with sessions yet, but AIUI you could do
  something like:
  
  from keystoneclient.auth.identity import v3
  from keystoneclient import session
  from keystoneclient.v3 import client
  
  auth = v3.Password(auth_url=OS_AUTH_URL,
 username=USERNAME,
 password=PASSWORD,
 project_id=PROJECT,
 user_domain_name='default')
  sess = session.Session(auth=auth)
  ks = client.Client(session=sess)
  
  And if you can pass the same session into the various clients tempest
  creates then the Password auth-plugin code takes care of reauthenticating
  if the token cached in the auth plugin object is expired, or nearly
  expired:
  
  https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120
  
  So in the tempest case, it seems like it may be a case of migrating the
  code creating the clients to use sessions instead of passing a token or
  username/password into the client object?
  
  That's my understanding of it atm anyway, hopefully jamielennox will be 
  along
  soon with more details :)
  
  Steve
 
 
 By clients here are you referring to the CLIs or the python libraries? 
 Implementation is at different points with each. 

I think for both heat and tempest we're talking about the python libraries
(Client objects).

 Sessions will handle automatically reauthenticating and retrying a request, 
 however it relies on the service throwing a 401 Unauthenticated error. If a 
 service is returning a 500 (or a timeout?) then there isn't much that a 
 client can/should do for that because we can't assume that trying again with 
 a new token will solve anything. 

Hmm, I was hoping it would reauthenticate based on the auth_ref
will_expire_soon, as it would fit better with out current usage of the
auth_ref in heat.

 
 At the moment we have keystoneclient, novaclient, cinderclient neutronclient 
 and then a number of the smaller

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-11 Thread Jamie Lennox


- Original Message -
 From: Sean Dague s...@dague.net
 To: openstack-dev@lists.openstack.org
 Sent: Thursday, 11 September, 2014 9:44:43 PM
 Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
 tokens leads to overall OpenStack fragility
 
 On 09/10/2014 08:46 PM, Jamie Lennox wrote:
  
  - Original Message -
  From: Steven Hardy sha...@redhat.com
  To: OpenStack Development Mailing List (not for usage questions)
  openstack-dev@lists.openstack.org
  Sent: Thursday, September 11, 2014 1:55:49 AM
  Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying
  tokens leads to overall OpenStack fragility
 
  On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
  Going through the untriaged Nova bugs, and there are a few on a similar
  pattern:
 
  Nova operation in progress takes a while
  Crosses keystone token expiration time
  Timeout thrown
  Operation fails
  Terrible 500 error sent back to user
 
  We actually have this exact problem in Heat, which I'm currently trying to
  solve:
 
  https://bugs.launchpad.net/heat/+bug/1306294
 
  Can you clarify, is the issue either:
 
  1. Create novaclient object with username/password
  2. Do series of operations via the client object which eventually fail
  after $n operations due to token expiry
 
  or:
 
  1. Create novaclient object with username/password
  2. Some really long operation which means token expires in the course of
  the service handling the request, blowing up and 500-ing
 
  If the former, then it does sound like a client, or usage-of-client bug,
  although note if you pass a *token* vs username/password (as is currently
  done for glance and heat in tempest, because we lack the code to get the
  token outside of the shell.py code..), there's nothing the client can do,
  because you can't request a new token with longer expiry with a token...
 
  However if the latter, then it seems like not really a client problem to
  solve, as it's hard to know what action to take if a request failed
  part-way through and thus things are in an unknown state.
 
  This issue is a hard problem, which can possibly be solved by
  switching to a trust scoped token (service impersonates the user), but
  then
  you're effectively bypassing token expiry via delegation which sits
  uncomfortably with me (despite the fact that we may have to do this in
  heat
  to solve the afforementioned bug)
 
  It seems like we should have a standard pattern that on token expiration
  the underlying code at least gives one retry to try to establish a new
  token to complete the flow, however as far as I can tell *no* clients do
  this.
 
  As has been mentioned, using sessions may be one solution to this, and
  AFAIK session support (where it doesn't already exist) is getting into
  various clients via the work being carried out to add support for v3
  keystone by David Hu:
 
  https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
 
  I see patches for Heat (currently gating), Nova and Ironic.
 
  I know we had to add that into Tempest because tempest runs can exceed 1
  hr, and we want to avoid random fails just because we cross a token
  expiration boundary.
 
  I can't claim great experience with sessions yet, but AIUI you could do
  something like:
 
  from keystoneclient.auth.identity import v3
  from keystoneclient import session
  from keystoneclient.v3 import client
 
  auth = v3.Password(auth_url=OS_AUTH_URL,
 username=USERNAME,
 password=PASSWORD,
 project_id=PROJECT,
 user_domain_name='default')
  sess = session.Session(auth=auth)
  ks = client.Client(session=sess)
 
  And if you can pass the same session into the various clients tempest
  creates then the Password auth-plugin code takes care of reauthenticating
  if the token cached in the auth plugin object is expired, or nearly
  expired:
 
  https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120
 
  So in the tempest case, it seems like it may be a case of migrating the
  code creating the clients to use sessions instead of passing a token or
  username/password into the client object?
 
  That's my understanding of it atm anyway, hopefully jamielennox will be
  along
  soon with more details :)
 
  Steve
  
  
  By clients here are you referring to the CLIs or the python libraries?
  Implementation is at different points with each.
  
  Sessions will handle automatically reauthenticating and retrying a request,
  however it relies on the service throwing a 401 Unauthenticated error. If
  a service is returning a 500 (or a timeout?) then there isn't much that a
  client can/should do for that because we can't assume that trying again
  with a new token will solve anything.
  
  At the moment we have keystoneclient, novaclient, cinderclient
  neutronclient and then a number of the smaller projects with support

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-11 Thread Jamie Lennox


- Original Message -
 From: Steven Hardy sha...@redhat.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Friday, 12 September, 2014 12:21:52 AM
 Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
 tokens leads to overall OpenStack fragility
 
 On Wed, Sep 10, 2014 at 08:46:45PM -0400, Jamie Lennox wrote:
  
  - Original Message -
   From: Steven Hardy sha...@redhat.com
   To: OpenStack Development Mailing List (not for usage questions)
   openstack-dev@lists.openstack.org
   Sent: Thursday, September 11, 2014 1:55:49 AM
   Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying
   tokens leads to overall OpenStack fragility
   
   On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
Going through the untriaged Nova bugs, and there are a few on a similar
pattern:

Nova operation in progress takes a while
Crosses keystone token expiration time
Timeout thrown
Operation fails
Terrible 500 error sent back to user
   
   We actually have this exact problem in Heat, which I'm currently trying
   to
   solve:
   
   https://bugs.launchpad.net/heat/+bug/1306294
   
   Can you clarify, is the issue either:
   
   1. Create novaclient object with username/password
   2. Do series of operations via the client object which eventually fail
   after $n operations due to token expiry
   
   or:
   
   1. Create novaclient object with username/password
   2. Some really long operation which means token expires in the course of
   the service handling the request, blowing up and 500-ing
   
   If the former, then it does sound like a client, or usage-of-client bug,
   although note if you pass a *token* vs username/password (as is currently
   done for glance and heat in tempest, because we lack the code to get the
   token outside of the shell.py code..), there's nothing the client can do,
   because you can't request a new token with longer expiry with a token...
   
   However if the latter, then it seems like not really a client problem to
   solve, as it's hard to know what action to take if a request failed
   part-way through and thus things are in an unknown state.
   
   This issue is a hard problem, which can possibly be solved by
   switching to a trust scoped token (service impersonates the user), but
   then
   you're effectively bypassing token expiry via delegation which sits
   uncomfortably with me (despite the fact that we may have to do this in
   heat
   to solve the afforementioned bug)
   
It seems like we should have a standard pattern that on token
expiration
the underlying code at least gives one retry to try to establish a new
token to complete the flow, however as far as I can tell *no* clients
do
this.
   
   As has been mentioned, using sessions may be one solution to this, and
   AFAIK session support (where it doesn't already exist) is getting into
   various clients via the work being carried out to add support for v3
   keystone by David Hu:
   
   https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
   
   I see patches for Heat (currently gating), Nova and Ironic.
   
I know we had to add that into Tempest because tempest runs can exceed
1
hr, and we want to avoid random fails just because we cross a token
expiration boundary.
   
   I can't claim great experience with sessions yet, but AIUI you could do
   something like:
   
   from keystoneclient.auth.identity import v3
   from keystoneclient import session
   from keystoneclient.v3 import client
   
   auth = v3.Password(auth_url=OS_AUTH_URL,
  username=USERNAME,
  password=PASSWORD,
  project_id=PROJECT,
  user_domain_name='default')
   sess = session.Session(auth=auth)
   ks = client.Client(session=sess)
   
   And if you can pass the same session into the various clients tempest
   creates then the Password auth-plugin code takes care of reauthenticating
   if the token cached in the auth plugin object is expired, or nearly
   expired:
   
   https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120
   
   So in the tempest case, it seems like it may be a case of migrating the
   code creating the clients to use sessions instead of passing a token or
   username/password into the client object?
   
   That's my understanding of it atm anyway, hopefully jamielennox will be
   along
   soon with more details :)
   
   Steve
  
  
  By clients here are you referring to the CLIs or the python libraries?
  Implementation is at different points with each.
 
 I think for both heat and tempest we're talking about the python libraries
 (Client objects).
 
  Sessions will handle automatically reauthenticating and retrying a request,
  however it relies on the service throwing a 401 Unauthenticated error

[openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-10 Thread Sean Dague
Going through the untriaged Nova bugs, and there are a few on a similar
pattern:

Nova operation in progress takes a while
Crosses keystone token expiration time
Timeout thrown
Operation fails
Terrible 500 error sent back to user

It seems like we should have a standard pattern that on token expiration
the underlying code at least gives one retry to try to establish a new
token to complete the flow, however as far as I can tell *no* clients do
this.

I know we had to add that into Tempest because tempest runs can exceed 1
hr, and we want to avoid random fails just because we cross a token
expiration boundary.

Anyone closer to the clients that can comment here?

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-10 Thread Endre Karlson
I think at least clients supporting keystone sessions that are configured
to use the auth.Password mech supports this since re-auth is done by the
session rather then the service client itself.

2014-09-10 16:14 GMT+02:00 Sean Dague s...@dague.net:

 Going through the untriaged Nova bugs, and there are a few on a similar
 pattern:

 Nova operation in progress takes a while
 Crosses keystone token expiration time
 Timeout thrown
 Operation fails
 Terrible 500 error sent back to user

 It seems like we should have a standard pattern that on token expiration
 the underlying code at least gives one retry to try to establish a new
 token to complete the flow, however as far as I can tell *no* clients do
 this.

 I know we had to add that into Tempest because tempest runs can exceed 1
 hr, and we want to avoid random fails just because we cross a token
 expiration boundary.

 Anyone closer to the clients that can comment here?

 -Sean

 --
 Sean Dague
 http://dague.net

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-10 Thread Sean Dague
Do we know which versions of the clients do that?

-Sean

On 09/10/2014 10:22 AM, Endre Karlson wrote:
 I think at least clients supporting keystone sessions that are
 configured to use the auth.Password mech supports this since re-auth is
 done by the session rather then the service client itself.
 
 2014-09-10 16:14 GMT+02:00 Sean Dague s...@dague.net
 mailto:s...@dague.net:
 
 Going through the untriaged Nova bugs, and there are a few on a similar
 pattern:
 
 Nova operation in progress takes a while
 Crosses keystone token expiration time
 Timeout thrown
 Operation fails
 Terrible 500 error sent back to user
 
 It seems like we should have a standard pattern that on token expiration
 the underlying code at least gives one retry to try to establish a new
 token to complete the flow, however as far as I can tell *no* clients do
 this.
 
 I know we had to add that into Tempest because tempest runs can exceed 1
 hr, and we want to avoid random fails just because we cross a token
 expiration boundary.
 
 Anyone closer to the clients that can comment here?
 
 -Sean
 
 --
 Sean Dague
 http://dague.net
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 mailto:OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-10 Thread Steven Hardy
On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
 Going through the untriaged Nova bugs, and there are a few on a similar
 pattern:
 
 Nova operation in progress takes a while
 Crosses keystone token expiration time
 Timeout thrown
 Operation fails
 Terrible 500 error sent back to user

We actually have this exact problem in Heat, which I'm currently trying to
solve:

https://bugs.launchpad.net/heat/+bug/1306294

Can you clarify, is the issue either:

1. Create novaclient object with username/password
2. Do series of operations via the client object which eventually fail
after $n operations due to token expiry

or:

1. Create novaclient object with username/password
2. Some really long operation which means token expires in the course of
the service handling the request, blowing up and 500-ing

If the former, then it does sound like a client, or usage-of-client bug,
although note if you pass a *token* vs username/password (as is currently
done for glance and heat in tempest, because we lack the code to get the
token outside of the shell.py code..), there's nothing the client can do,
because you can't request a new token with longer expiry with a token...

However if the latter, then it seems like not really a client problem to
solve, as it's hard to know what action to take if a request failed
part-way through and thus things are in an unknown state.

This issue is a hard problem, which can possibly be solved by
switching to a trust scoped token (service impersonates the user), but then
you're effectively bypassing token expiry via delegation which sits
uncomfortably with me (despite the fact that we may have to do this in heat
to solve the afforementioned bug)

 It seems like we should have a standard pattern that on token expiration
 the underlying code at least gives one retry to try to establish a new
 token to complete the flow, however as far as I can tell *no* clients do
 this.

As has been mentioned, using sessions may be one solution to this, and
AFAIK session support (where it doesn't already exist) is getting into
various clients via the work being carried out to add support for v3
keystone by David Hu:

https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z

I see patches for Heat (currently gating), Nova and Ironic.

 I know we had to add that into Tempest because tempest runs can exceed 1
 hr, and we want to avoid random fails just because we cross a token
 expiration boundary.

I can't claim great experience with sessions yet, but AIUI you could do
something like:

from keystoneclient.auth.identity import v3
from keystoneclient import session
from keystoneclient.v3 import client

auth = v3.Password(auth_url=OS_AUTH_URL,
   username=USERNAME,
   password=PASSWORD,
   project_id=PROJECT,
   user_domain_name='default')
sess = session.Session(auth=auth)
ks = client.Client(session=sess)

And if you can pass the same session into the various clients tempest
creates then the Password auth-plugin code takes care of reauthenticating
if the token cached in the auth plugin object is expired, or nearly
expired:

https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120

So in the tempest case, it seems like it may be a case of migrating the
code creating the clients to use sessions instead of passing a token or
username/password into the client object?

That's my understanding of it atm anyway, hopefully jamielennox will be along
soon with more details :)

Steve

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-10 Thread Jamie Lennox

- Original Message -
 From: Steven Hardy sha...@redhat.com
 To: OpenStack Development Mailing List (not for usage questions) 
 openstack-dev@lists.openstack.org
 Sent: Thursday, September 11, 2014 1:55:49 AM
 Subject: Re: [openstack-dev] [all] [clients] [keystone] lack of retrying 
 tokens leads to overall OpenStack fragility
 
 On Wed, Sep 10, 2014 at 10:14:32AM -0400, Sean Dague wrote:
  Going through the untriaged Nova bugs, and there are a few on a similar
  pattern:
  
  Nova operation in progress takes a while
  Crosses keystone token expiration time
  Timeout thrown
  Operation fails
  Terrible 500 error sent back to user
 
 We actually have this exact problem in Heat, which I'm currently trying to
 solve:
 
 https://bugs.launchpad.net/heat/+bug/1306294
 
 Can you clarify, is the issue either:
 
 1. Create novaclient object with username/password
 2. Do series of operations via the client object which eventually fail
 after $n operations due to token expiry
 
 or:
 
 1. Create novaclient object with username/password
 2. Some really long operation which means token expires in the course of
 the service handling the request, blowing up and 500-ing
 
 If the former, then it does sound like a client, or usage-of-client bug,
 although note if you pass a *token* vs username/password (as is currently
 done for glance and heat in tempest, because we lack the code to get the
 token outside of the shell.py code..), there's nothing the client can do,
 because you can't request a new token with longer expiry with a token...
 
 However if the latter, then it seems like not really a client problem to
 solve, as it's hard to know what action to take if a request failed
 part-way through and thus things are in an unknown state.
 
 This issue is a hard problem, which can possibly be solved by
 switching to a trust scoped token (service impersonates the user), but then
 you're effectively bypassing token expiry via delegation which sits
 uncomfortably with me (despite the fact that we may have to do this in heat
 to solve the afforementioned bug)
 
  It seems like we should have a standard pattern that on token expiration
  the underlying code at least gives one retry to try to establish a new
  token to complete the flow, however as far as I can tell *no* clients do
  this.
 
 As has been mentioned, using sessions may be one solution to this, and
 AFAIK session support (where it doesn't already exist) is getting into
 various clients via the work being carried out to add support for v3
 keystone by David Hu:
 
 https://review.openstack.org/#/q/owner:david.hu%2540hp.com,n,z
 
 I see patches for Heat (currently gating), Nova and Ironic.
 
  I know we had to add that into Tempest because tempest runs can exceed 1
  hr, and we want to avoid random fails just because we cross a token
  expiration boundary.
 
 I can't claim great experience with sessions yet, but AIUI you could do
 something like:
 
 from keystoneclient.auth.identity import v3
 from keystoneclient import session
 from keystoneclient.v3 import client
 
 auth = v3.Password(auth_url=OS_AUTH_URL,
username=USERNAME,
password=PASSWORD,
project_id=PROJECT,
user_domain_name='default')
 sess = session.Session(auth=auth)
 ks = client.Client(session=sess)
 
 And if you can pass the same session into the various clients tempest
 creates then the Password auth-plugin code takes care of reauthenticating
 if the token cached in the auth plugin object is expired, or nearly
 expired:
 
 https://github.com/openstack/python-keystoneclient/blob/master/keystoneclient/auth/identity/base.py#L120
 
 So in the tempest case, it seems like it may be a case of migrating the
 code creating the clients to use sessions instead of passing a token or
 username/password into the client object?
 
 That's my understanding of it atm anyway, hopefully jamielennox will be along
 soon with more details :)
 
 Steve


By clients here are you referring to the CLIs or the python libraries? 
Implementation is at different points with each. 

Sessions will handle automatically reauthenticating and retrying a request, 
however it relies on the service throwing a 401 Unauthenticated error. If a 
service is returning a 500 (or a timeout?) then there isn't much that a client 
can/should do for that because we can't assume that trying again with a new 
token will solve anything. 

At the moment we have keystoneclient, novaclient, cinderclient neutronclient 
and then a number of the smaller projects with support for sessions. That 
obviously doesn't mean that existing users of that code have transitioned to 
the newer way though. David Hu has been working on using this code within the 
existing CLIs. I have prototypes for at least nova to talk to neutron and 
cinder which i'm waiting for Kilo to push. From there it should be easier to do 
this for other services. 

For service to service communication

Re: [openstack-dev] [all] [clients] [keystone] lack of retrying tokens leads to overall OpenStack fragility

2014-09-10 Thread Angus Lees
On Wed, 10 Sep 2014 10:14:32 AM Sean Dague wrote:
 Going through the untriaged Nova bugs, and there are a few on a similar
 pattern:
 
 Nova operation in progress takes a while
 Crosses keystone token expiration time
 Timeout thrown
 Operation fails
 Terrible 500 error sent back to user
 
 It seems like we should have a standard pattern that on token expiration
 the underlying code at least gives one retry to try to establish a new
 token to complete the flow, however as far as I can tell *no* clients do
 this.

Just because this came up in conversation a few weeks ago in the context of 
the ironic client.  I've read some docs and written a keystone client, but I'm 
not super-familiar with keystone internals - apologies if I miss something 
fundamental.


There are two broadly different approaches to dealing with this:

(As described by Sean, and implemented in a few clients)
1. At the bottom layer, try to refresh the token and immediately retry 
whenever a server response indicates the token has expired.

(As inspired by eg kerberos)
2. Ensure at some environmental/top layer that the advertised token lifetime 
exceeds the timeout set on the request, before making the request.  This 
implies (since there's no special handling in place) failing if the token was 
expired earlier than expected.


The primary distinction being that in (2) the client is ignorant of how to 
create tokens, and just assumes they're valid.

(2) is particularly easy to code for simple one shot command line clients.  
For a persistent client, the easiest approach is probably to have an 
asynchronous loop that just keeps refreshing the stored token whenever it 
approaches expiry - max_single_request_timeout.

My concern with (1) is that it involves passing username/password all the way 
down to the bottom layers - see the heat example where this means crossing 
into another program/service.  Moreover, if the token was expired earlier than 
advertised it probably means the admin has deliberately rejected the user or 
something and the intent is that they _should_ be locked out - it would be 
unfortunate to have a synchronised retry attack on keystone from all the 
rejected clients at that point :/

-- 
 - Gus

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev