Re: [Openstack] Is there a reason Nova doesn't use scoped sessions in sqlalchemy ?

2011-11-01 Thread Day, Phil
Hi Vish,

I probably wasn't careful enough with my wording - the API server may not be 
threaded as such, but the use of eventlets gives effectively the same 
concurrency issues that point towards needing to use scoped sessions.

Our basis for concluding that this is some form of concurrency issue is that we 
can easily reproduce the issue by running concurrent requests into an API 
server, and we have seen the problem disappear if we reduce the eventlet pool 
to 1 or change to scoped sessions.   Whilst the symptom is that the session has 
terminated by the time the lazy load is requested, as far as we can see the 
eventlet handing the query hasn't itself terminated the session - although it 
does seem likely that another eventlet using the same shared session could 
have. This seems to be specifically the type of issue that scoped sessions are 
intended to address.

http://www.sqlalchemy.org/docs/orm/session.html#contextual-thread-local-sessions

All of this is based on a limited understanding of how sqlalchemy is used in 
Nova - I'd be more than happy to be corrected by others with more experience, 
hence the question to the mailing list.

I fully understand the drive to clean up the database layer, and I'm not 
knocking the fix to 855660 - its clearly a good template for the way the DB 
needs to go in Essex.   My concern is that as shown by 855660 these changes 
have a pretty wide scope, and by the time that's been expanded to all of the 
current joinedloads it feels like it would be such a large set of changes that 
I'd be concerned about them coming back into Diablo.Stable.

Hence instead we were looking for a much smaller change that can address the 
whole class of problem of joinedloads in Diablo for now ahead of the DB 
refactoring in Essex - and from our testing scoped sessions seem to address 
that.  However as changing to scoped session breaks the migrate code in unit 
tests, and not really understanding why this is or the intricacies of the DB 
unit tests I wanted to see if we were heading down a path that had already been 
examined and discarded before we spend too much time on it.

I'd be really interested in hearing from anyone with experience of 
scoped_sessions, and/or willing to help us understand the issues we're seeing 
in the Unit Tests.

And of course I'd like to know what the communities feeling is towards a 
simpler approach to fixing the issue in Diablo.Final vs the backport of DB 
simplification changes from Essex - which I'm assuming will take some tiem yet 
to work through all of the joinedloads.

Phil

From: Vishvananda Ishaya [mailto:vishvana...@gmail.com]
Sent: 31 October 2011 19:50
To: Day, Phil
Cc: openstack@lists.launchpad.net (openstack@lists.launchpad.net); Johnson, 
Andrew Gordon (HP Cloud Services); Hassan, Ahmad; Haynes, David; 
nova-datab...@lists.launchpad.net
Subject: Re: [Openstack] Is there a reason Nova doesn't use scoped sessions in 
sqlalchemy ?

All of the workers are single-threaded, so I'm not sure that scoped sessions 
are really necessary.

We did however decide that objects from the db layer are supposed to be simple 
dictionaries.  We currently allow nested dictionaries to optimize joined 
objects. Unfortunately we never switched to sanitizing data from sqlalchemy, 
and instead we make the sqlalchemy objects provide a dictionary-like interface 
and pass the object itself.

The issue that you're seeing is because network wasn't properly 'joinedload'ed 
in the initial query, and because the data is not sanitized, sqlalchemy tries 
to joinedload, but the session has been terminated.  If we had sanitized data, 
we would get a more useful error like a key error when network is accessed. The 
current solution is to add the proper joinedload.

One of the goals of the nova-database team is to do the necessary data 
sanitization and to remove as many of the joinedloads as possible (hopefully 
all of them).

Vish

On Oct 31, 2011, at 12:25 PM, Day, Phil wrote:


Hi Folks,

We've been looking into a problem which looks a lot like:

https://bugs.launchpad.net/nova/+bug/855660



2011-10-21 14:13:31,035 ERROR nova.api [5bd52130-d46f-4702-b06b-9ca5045473d7 
smokeuser smokeproject] Unexpected error raised: Parent instance FixedIp at 
0x4e74490 is not bound to a Session; lazy load operation of attribute 
'network' cannot proceed
(nova.api): TRACE: Traceback (most recent call last):
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/__init__.py, line 363, in 
__call__
(nova.api): TRACE: result = api_request.invoke(context)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/apirequest.py, line 90, in 
invoke
(nova.api): TRACE: result = method(context, **args)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1195, in 
describe_instances
(nova.api): TRACE: instance_id=instance_id)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1204

Re: [Openstack] Is there a reason Nova doesn't use scoped sessions in sqlalchemy ?

2011-11-01 Thread Vishvananda Ishaya
I'm still a bit unconvinced for two reasons:

a) clobberring a session would require a monkeypatched call while the session 
is still open. AFAIK we don't have any calls in sqlalchemy/api.py that are 
doing any fancy socket stuff, so calls through the db layer should be 
happening synchronously

b) if this were really an issue it seems like we would see it all the time, 
instead of just under a specific set of circumstances.

It would be great to force a simple repro test case that proves that this is 
blowing up. That said, I have no issue switching over to scoped sessions. We 
discussed it at one point and felt that it wasn't necessary, but if you are 
reasonably sure that is the issue, we can definitely switch.

I would expect that the issues with the migrations and the test cases are 
fixable.

Vish

On Nov 1, 2011, at 7:20 AM, Day, Phil wrote:

 Hi Vish,
  
 I probably wasn’t careful enough with my wording – the API server may not be 
 threaded as such, but the use of eventlets gives effectively the same 
 concurrency issues that point towards needing to use scoped sessions.
  
 Our basis for concluding that this is some form of concurrency issue is that 
 we can easily reproduce the issue by running concurrent requests into an API 
 server, and we have seen the problem disappear if we reduce the eventlet pool 
 to 1 or change to scoped sessions.   Whilst the symptom is that the session 
 has terminated by the time the lazy load is requested, as far as we can see 
 the eventlet handing the query hasn’t itself terminated the session – 
 although it does seem likely that another eventlet using the same shared 
 session could have. This seems to be specifically the type of issue that 
 scoped sessions are intended to address.   
  
 http://www.sqlalchemy.org/docs/orm/session.html#contextual-thread-local-sessions
  
 All of this is based on a limited understanding of how sqlalchemy is used in 
 Nova – I’d be more than happy to be corrected by others with more experience, 
 hence the question to the mailing list. 
  
 I fully understand the drive to clean up the database layer, and I’m not 
 knocking the fix to 855660 – its clearly a good template for the way the DB 
 needs to go in Essex.   My concern is that as shown by 855660 these changes 
 have a pretty wide scope, and by the time that’s been expanded to all of the 
 current joinedloads it feels like it would be such a large set of changes 
 that I’d be concerned about them coming back into Diablo.Stable.
  
 Hence instead we were looking for a much smaller change that can address the 
 whole class of problem of joinedloads in Diablo for now ahead of the DB 
 refactoring in Essex – and from our testing scoped sessions seem to address 
 that.  However as changing to scoped session breaks the migrate code in unit 
 tests, and not really understanding why this is or the intricacies of the DB 
 unit tests I wanted to see if we were heading down a path that had already 
 been examined and discarded before we spend too much time on it.
  
 I’d be really interested in hearing from anyone with experience of 
 scoped_sessions, and/or willing to help us understand the issues we’re seeing 
 in the Unit Tests.
  
 And of course I’d like to know what the communities feeling is towards a 
 simpler approach to fixing the issue in Diablo.Final vs the backport of DB 
 simplification changes from Essex – which I’m assuming will take some tiem 
 yet to work through all of the joinedloads.
  
 Phil
  
 From: Vishvananda Ishaya [mailto:vishvana...@gmail.com] 
 Sent: 31 October 2011 19:50
 To: Day, Phil
 Cc: openstack@lists.launchpad.net (openstack@lists.launchpad.net); Johnson, 
 Andrew Gordon (HP Cloud Services); Hassan, Ahmad; Haynes, David; 
 nova-datab...@lists.launchpad.net
 Subject: Re: [Openstack] Is there a reason Nova doesn't use scoped sessions 
 in sqlalchemy ?
  
 All of the workers are single-threaded, so I'm not sure that scoped sessions 
 are really necessary.
  
 We did however decide that objects from the db layer are supposed to be 
 simple dictionaries.  We currently allow nested dictionaries to optimize 
 joined objects. Unfortunately we never switched to sanitizing data from 
 sqlalchemy, and instead we make the sqlalchemy objects provide a 
 dictionary-like interface and pass the object itself.
  
 The issue that you're seeing is because network wasn't properly 
 'joinedload'ed in the initial query, and because the data is not sanitized, 
 sqlalchemy tries to joinedload, but the session has been terminated.  If we 
 had sanitized data, we would get a more useful error like a key error when 
 network is accessed. The current solution is to add the proper joinedload.
  
 One of the goals of the nova-database team is to do the necessary data 
 sanitization and to remove as many of the joinedloads as possible (hopefully 
 all of them).
  
 Vish
  
 On Oct 31, 2011, at 12:25 PM, Day, Phil wrote:
 
 
 Hi Folks,
  
 We’ve been looking into a problem

[Openstack] Is there a reason Nova doesn't use scoped sessions in sqlalchemy ?

2011-10-31 Thread Day, Phil
Hi Folks,

We've been looking into a problem which looks a lot like:

https://bugs.launchpad.net/nova/+bug/855660



2011-10-21 14:13:31,035 ERROR nova.api [5bd52130-d46f-4702-b06b-9ca5045473d7 
smokeuser smokeproject] Unexpected error raised: Parent instance FixedIp at 
0x4e74490 is not bound to a Session; lazy load operation of attribute 
'network' cannot proceed
(nova.api): TRACE: Traceback (most recent call last):
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/__init__.py, line 363, in 
__call__
(nova.api): TRACE: result = api_request.invoke(context)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/apirequest.py, line 90, in 
invoke
(nova.api): TRACE: result = method(context, **args)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1195, in 
describe_instances
(nova.api): TRACE: instance_id=instance_id)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1204, in 
_format_describe_instances
(nova.api): TRACE: return {'reservationSet': self._format_instances(context, 
**kwargs)}
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1309, in 
_format_instances
(nova.api): TRACE: if fixed['network'] and use_v6:
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/models.py, line 76, in 
__getitem__
(nova.api): TRACE: return getattr(self, key)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py, line 163, in 
__get__
(nova.api): TRACE: instance_dict(instance))
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py, line 383, in 
get
(nova.api): TRACE: value = callable_(passive=passive)
(nova.api): TRACE: File 
/usr/lib/python2.7/dist-packages/sqlalchemy/orm/strategies.py, line 595, in 
__call__
(nova.api): TRACE: (mapperutil.state_str(state), self.key)
(nova.api): TRACE: DetachedInstanceError: Parent instance FixedIp at 
0x4e74490 is not bound to a Session; lazy load operation of attribute 
'network' cannot proceed
(nova.api): TRACE:


As far as we can see the problem seems to be related to some conflict between 
multiple threads in the same API server instance and lazy loading of some part 
of the object.

Looking at the sqlalchemy documentation it seems to strongly suggest that when 
used from multi-threaded WSGI applications that scoped_sessions should be used 
(I'm not clear on the details but it seems that this effectively makes lazy 
load operations thread safe).However whilst this fixes the problem it has a 
bad effect on the unit tests - in particular it seems to upset all of the DB 
migration code used in the unit tests.

So does anyone know if there was an explicit decision / reason not to use 
scoped_sessions in Nova ?

Thanks,
Phil

PS:  The other possible fix we've found is to change sqlalchemy/models.py so 
that the associations are explicitly set to use eager load - which also seems 
to fix the problem but feels like a more clumsy way to go about it.   Any 
thoughts on that would also be appreciated ?



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Is there a reason Nova doesn't use scoped sessions in sqlalchemy ?

2011-10-31 Thread Jay Pipes
On Mon, Oct 31, 2011 at 3:25 PM, Day, Phil philip@hp.com wrote:
 Hi Folks,

 We’ve been looking into a problem which looks a lot like:

 https://bugs.launchpad.net/nova/+bug/855660

 2011-10-21 14:13:31,035 ERROR nova.api [5bd52130-d46f-4702-b06b-9ca5045473d7
 smokeuser smokeproject] Unexpected error raised: Parent instance FixedIp at
 0x4e74490 is not bound to a Session; lazy load operation of attribute
 'network' cannot proceed
 (nova.api): TRACE: Traceback (most recent call last):
 (nova.api): TRACE: File
 /usr/lib/python2.7/dist-packages/nova/api/ec2/__init__.py, line 363, in
 __call__
 (nova.api): TRACE: result = api_request.invoke(context)
 (nova.api): TRACE: File
 /usr/lib/python2.7/dist-packages/nova/api/ec2/apirequest.py, line 90, in
 invoke
 (nova.api): TRACE: result = method(context, **args)
 (nova.api): TRACE: File
 /usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1195, in
 describe_instances
 (nova.api): TRACE: instance_id=instance_id)
 (nova.api): TRACE: File
 /usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1204, in
 _format_describe_instances
 (nova.api): TRACE: return {'reservationSet': self._format_instances(context,
 **kwargs)}
 (nova.api): TRACE: File
 /usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1309, in
 _format_instances
 (nova.api): TRACE: if fixed['network'] and use_v6:
 (nova.api): TRACE: File
 /usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/models.py, line 76, in
 __getitem__
 (nova.api): TRACE: return getattr(self, key)
 (nova.api): TRACE: File
 /usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py, line 163,
 in __get__
 (nova.api): TRACE: instance_dict(instance))
 (nova.api): TRACE: File
 /usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py, line 383,
 in get
 (nova.api): TRACE: value = callable_(passive=passive)
 (nova.api): TRACE: File
 /usr/lib/python2.7/dist-packages/sqlalchemy/orm/strategies.py, line 595,
 in __call__
 (nova.api): TRACE: (mapperutil.state_str(state), self.key)
 (nova.api): TRACE: DetachedInstanceError: Parent instance FixedIp at
 0x4e74490 is not bound to a Session; lazy load operation of attribute
 'network' cannot proceed
 (nova.api): TRACE:

Have you tried backporting the fix for the bug you mentioned above to
your local branch? Did that have any affect?

Thanks!
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Is there a reason Nova doesn't use scoped sessions in sqlalchemy ?

2011-10-31 Thread Vishvananda Ishaya
All of the workers are single-threaded, so I'm not sure that scoped sessions 
are really necessary.

We did however decide that objects from the db layer are supposed to be simple 
dictionaries.  We currently allow nested dictionaries to optimize joined 
objects. Unfortunately we never switched to sanitizing data from sqlalchemy, 
and instead we make the sqlalchemy objects provide a dictionary-like interface 
and pass the object itself.

The issue that you're seeing is because network wasn't properly 'joinedload'ed 
in the initial query, and because the data is not sanitized, sqlalchemy tries 
to joinedload, but the session has been terminated.  If we had sanitized data, 
we would get a more useful error like a key error when network is accessed. The 
current solution is to add the proper joinedload.

One of the goals of the nova-database team is to do the necessary data 
sanitization and to remove as many of the joinedloads as possible (hopefully 
all of them).

Vish

On Oct 31, 2011, at 12:25 PM, Day, Phil wrote:

 Hi Folks,
  
 We’ve been looking into a problem which looks a lot like:
  
 https://bugs.launchpad.net/nova/+bug/855660
  
  
  
 2011-10-21 14:13:31,035 ERROR nova.api [5bd52130-d46f-4702-b06b-9ca5045473d7 
 smokeuser smokeproject] Unexpected error raised: Parent instance FixedIp at 
 0x4e74490 is not bound to a Session; lazy load operation of attribute 
 'network' cannot proceed 
 (nova.api): TRACE: Traceback (most recent call last): 
 (nova.api): TRACE: File 
 /usr/lib/python2.7/dist-packages/nova/api/ec2/__init__.py, line 363, in 
 __call__ 
 (nova.api): TRACE: result = api_request.invoke(context) 
 (nova.api): TRACE: File 
 /usr/lib/python2.7/dist-packages/nova/api/ec2/apirequest.py, line 90, in 
 invoke 
 (nova.api): TRACE: result = method(context, **args) 
 (nova.api): TRACE: File 
 /usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1195, in 
 describe_instances 
 (nova.api): TRACE: instance_id=instance_id) 
 (nova.api): TRACE: File 
 /usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1204, in 
 _format_describe_instances 
 (nova.api): TRACE: return {'reservationSet': self._format_instances(context, 
 **kwargs)} 
 (nova.api): TRACE: File 
 /usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py, line 1309, in 
 _format_instances 
 (nova.api): TRACE: if fixed['network'] and use_v6: 
 (nova.api): TRACE: File 
 /usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/models.py, line 76, in 
 __getitem__ 
 (nova.api): TRACE: return getattr(self, key) 
 (nova.api): TRACE: File 
 /usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py, line 163, in 
 __get__ 
 (nova.api): TRACE: instance_dict(instance)) 
 (nova.api): TRACE: File 
 /usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py, line 383, in 
 get 
 (nova.api): TRACE: value = callable_(passive=passive) 
 (nova.api): TRACE: File 
 /usr/lib/python2.7/dist-packages/sqlalchemy/orm/strategies.py, line 595, in 
 __call__ 
 (nova.api): TRACE: (mapperutil.state_str(state), self.key) 
 (nova.api): TRACE: DetachedInstanceError: Parent instance FixedIp at 
 0x4e74490 is not bound to a Session; lazy load operation of attribute 
 'network' cannot proceed 
 (nova.api): TRACE:
  
  
 As far as we can see the problem seems to be related to some conflict between 
 multiple threads in the same API server instance and lazy loading of some 
 part of the object.
  
 Looking at the sqlalchemy documentation it seems to strongly suggest that 
 when used from multi-threaded WSGI applications that scoped_sessions should 
 be used (I’m not clear on the details but it seems that this effectively 
 makes lazy load operations thread safe).However whilst this fixes the 
 problem it has a bad effect on the unit tests – in particular it seems to 
 upset all of the DB migration code used in the unit tests.
  
 So does anyone know if there was an explicit decision / reason not to use 
 scoped_sessions in Nova ?
  
 Thanks,
 Phil
  
 PS:  The other possible fix we’ve found is to change sqlalchemy/models.py so 
 that the associations are explicitly set to use eager load – which also seems 
 to fix the problem but feels like a more clumsy way to go about it.   Any 
 thoughts on that would also be appreciated ?
  
  
  
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp