Re: [openstack-dev] In memory joins in Nova

2015-08-12 Thread Mike Bayer



On 8/11/15 7:14 PM, Sachin Manpathak wrote:
I am struggling with python code profiling in general. It has its own 
caveats like 100% plus overhead.
However, on a host with only nova services (DB on a different host), I 
see cpu utilization spike up quickly with scale. The DB server is 
relatively calm and never goes over 20%. On a system which relies on 
DB to fetch all the data, this should not happen.
The DB's resources are intended to scale up in response to wide degree 
of concurrency, that is, lots and lots of API services all hitting it 
from many concurrent API calls.with scale here is a slippery 
term.  What kind of concurrency are you testing with ? How many CPUs 
serving API calls are utilized simultaneously?   To saturate the 
database you need many dozens, and even then you don't want your 
database CPU going very high.   20% does not seem that low to me, 
actually.I disagree with the concept that high database CPU refers 
to a performant application, or that DB saturation is a requirement in 
order for a database-delivered application to be performant; I think the 
opposite is true. In web application development, when I worked with 
production sites at high volume, the goal was to use enough caching so 
that major site pages being viewed constantly could be delivered with 
*no* database access whatsoever. We wanted to see the majority of the 
site being sent to customers with the database at essentially zero; this 
is how you get page response times down from 200-300 ms down to 20 or 
30.  If you want to measure performance, looking at API response 
time is probably better than looking at CPU utilization first.


That said, Python is a very CPU intensive language, because it is an 
interpreted scripting language.   Operations that in a language like 
compiled C would be hardly a whisper of CPU end up being major 
operations in Python. Openstack suffers from a large amount of 
function call overhead even for simple API operations, as it is an 
extremely layered system with very little use of caching.   Until it 
moves to a JIT-based interpreter like Pypy that can flatten out 
call-chains, the amount of overhead just for an API call to come in and 
go back out with a response will remain significant.   As for caching, 
making use of a technique such as memcached caching of data structures 
can also greatly improve performance because we can cache pre-assembled 
data, removing the need to repeatedly extract it from multiple tables to 
be pieced together in Python, which is also a very CPU intensive 
activity.   This is something that will be happening more in the future, 
but as it improves the performance of Openstack, it will be removing 
even more load from the database. Again, I'd look at API response times 
as the first thing to measure.


That said, certainly the joining of data in Python may be unnecessary 
and I'm not sure if we can't revisit the history Dan refers to when he 
says there were very large result sets, if we are referring to number 
of rows, joining in SQL or in Python will still involve the same number 
of rows, and SQLAlchemy also offers many techniques of optimizing the 
overhead of fetching lots of rows which Nova currently doesn't make use 
of (see 
https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Eager_load_and_Column_load_tuning 
for a primer on this).


If OTOH we are referring to the width of the columns and the join is 
such that you're going to get the same A identity over and over again,  
if you join A and B you get a wide row with all of A and B with a very 
large amount of redundant data sent over the wire again and again (note 
that the database drivers available to us in Python always send all rows 
and columns over the wire unconditionally, whether or not we fetch them 
in application code).  In this case you *do* want to do the join in 
Python to some extent, though you use the database to deliver the 
simplest information possible to work with first; you get the full row 
for all of the A entries, then a second query for all of B plus A's 
primary key that can be quickly matched to that of A.SQLAlchemy 
offers this as subquery eager loading and it is definitely much more 
performant than a single full join when you have wide rows for 
individual entities.The database is doing the join to the extent 
that it can deliver the primary key information for A and B which can be 
operated upon very quickly in memory, as we already have all the A 
identities in a hash lookup in any case.


Overall if you're looking to make Openstack faster, where you want to be 
is 1. what is the response time of an API call and 2. what do the Python 
profiles look like for those API calls?  For a primer on Python 
profiling see for example my own FAQ entry here: 
http://docs.sqlalchemy.org/en/rel_1_0/faq/performance.html#code-profiling. 
This kind of profiling is a lot of work and is very tedious, compared to 
just running a big rally job 

Re: [openstack-dev] In memory joins in Nova

2015-08-12 Thread Dan Smith
 If OTOH we are referring to the width of the columns and the join is
 such that you're going to get the same A identity over and over again, 
 if you join A and B you get a wide row with all of A and B with a very
 large amount of redundant data sent over the wire again and again (note
 that the database drivers available to us in Python always send all rows
 and columns over the wire unconditionally, whether or not we fetch them
 in application code).

Yep, it was this. N instances times M rows of metadata each. If you pull
100 instances and they each have 30 rows of system metadata, that's a
lot of data, and most of it is the instance being repeated 30 times for
each metadata row. When we first released code doing this, a prominent
host immediately raised the red flag because their DB traffic shot
through the roof.

 In this case you *do* want to do the join in
 Python to some extent, though you use the database to deliver the
 simplest information possible to work with first; you get the full row
 for all of the A entries, then a second query for all of B plus A's
 primary key that can be quickly matched to that of A.

This is what we're doing. Fetch the list of instances that match the
filters, then for the ones that were returned, get their metadata.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] In memory joins in Nova

2015-08-12 Thread Dan Smith
 In the past I've taken a different approach to problematic one to 
 many relationships and have made the metadata a binary JSON blob. Is
 there some reason that won't work?

We have done that for various pieces of data that were previously in
system_metadata. Where this breaks down is if you need to be able to
select instances based on keys in the metadata blob, which we do in
various scheduling operations (certainly for aggregate metadata, at
least). I *believe* we have to leave metadata as row-based for that
reason (although honestly I don't remember the details), and probably
system_metadata as well, but I'd have to survey what is left in there.

 Since the metadata is nearly always queried as a whole, this seems
 like a valid approach that would keep DB traffic low but also ease
 the burden of reassembling the collection in nova-api.

'Nearly' being the key word there. We just got done moving all of the
flavor information we used to stash in system_metadata to a JSON blob in
the database. That cuts 10-30 rows of system_metadata for each instance,
depending on the state, and gives us a thing we can selectively join
with instance for a single load with little overhead. We might be able
to get away with going back to fully joining system_metadata given the
reduction in size, but we honestly don't even need to query it as often
after the flavor-ectomy, so I'm not sure it's worth it. Further, after
the explosion of system_metadata which caused us to stop joining it in
the first place, it was realized that a user could generate a lot of
traffic by exhausting their quota of metadata items (which they
control), so we probably want to join user metadata in python anyway for
that reason.

So I guess the summary is: I think with flavor data out of the path, the
major offender is gone, such that this becomes extremely low on the
priority list.

--Dan



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] In memory joins in Nova

2015-08-12 Thread Clint Byrum
Excerpts from Mike Bayer's message of 2015-08-13 11:03:32 +0800:
 
 On 8/12/15 10:29 PM, Clint Byrum wrote:
  Excerpts from Dan Smith's message of 2015-08-12 23:12:23 +0800:
  If OTOH we are referring to the width of the columns and the join is
  such that you're going to get the same A identity over and over again,
  if you join A and B you get a wide row with all of A and B with a very
  large amount of redundant data sent over the wire again and again (note
  that the database drivers available to us in Python always send all rows
  and columns over the wire unconditionally, whether or not we fetch them
  in application code).
  Yep, it was this. N instances times M rows of metadata each. If you pull
  100 instances and they each have 30 rows of system metadata, that's a
  lot of data, and most of it is the instance being repeated 30 times for
  each metadata row. When we first released code doing this, a prominent
  host immediately raised the red flag because their DB traffic shot
  through the roof.
 
  In the past I've taken a different approach to problematic one to
  many relationships and have made the metadata a binary JSON blob.
  Is there some reason that won't work? Of course, this type of thing
  can run into concurrency issues on update, but these can be handled by
  SELECT..FOR UPDATE + intelligent retry on deadlock. Since the metadata
  is nearly always queried as a whole, this seems like a valid approach
  that would keep DB traffic low but also ease the burden of reassembling
  the collection in nova-api.
 
 JSON blobs have the disadvantages that you are piggybacking an entirely 
 different storage model on top of the relational one, losing all the 
 features you might like about the relational model like rich datatypes 
 (I understand our JSON decoders trip up on plain datetimes?), insert 
 defaults, nullability constraints, a fixed, predefined schema that can 
 be altered in a controlled, all-or-nothing way, efficient storage 
 characteristics, and of course reasonable querying capabilities.   They 
 are useful IMO only for small sections of data that are amenable to 
 ad-hoc changes in schema like simple bags of key-value pairs containing 
 miscellaneous features.
 

Agreed on all points!. And metadata for instances is exactly that:
a simple bag of key/value strings that is almost always queried and
delivered as a whole.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] In memory joins in Nova

2015-08-12 Thread Clint Byrum
Excerpts from Dan Smith's message of 2015-08-12 23:12:23 +0800:
  If OTOH we are referring to the width of the columns and the join is
  such that you're going to get the same A identity over and over again, 
  if you join A and B you get a wide row with all of A and B with a very
  large amount of redundant data sent over the wire again and again (note
  that the database drivers available to us in Python always send all rows
  and columns over the wire unconditionally, whether or not we fetch them
  in application code).
 
 Yep, it was this. N instances times M rows of metadata each. If you pull
 100 instances and they each have 30 rows of system metadata, that's a
 lot of data, and most of it is the instance being repeated 30 times for
 each metadata row. When we first released code doing this, a prominent
 host immediately raised the red flag because their DB traffic shot
 through the roof.
 

In the past I've taken a different approach to problematic one to
many relationships and have made the metadata a binary JSON blob.
Is there some reason that won't work? Of course, this type of thing
can run into concurrency issues on update, but these can be handled by
SELECT..FOR UPDATE + intelligent retry on deadlock. Since the metadata
is nearly always queried as a whole, this seems like a valid approach
that would keep DB traffic low but also ease the burden of reassembling
the collection in nova-api.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] In memory joins in Nova

2015-08-12 Thread Mike Bayer



On 8/12/15 10:29 PM, Clint Byrum wrote:

Excerpts from Dan Smith's message of 2015-08-12 23:12:23 +0800:

If OTOH we are referring to the width of the columns and the join is
such that you're going to get the same A identity over and over again,
if you join A and B you get a wide row with all of A and B with a very
large amount of redundant data sent over the wire again and again (note
that the database drivers available to us in Python always send all rows
and columns over the wire unconditionally, whether or not we fetch them
in application code).

Yep, it was this. N instances times M rows of metadata each. If you pull
100 instances and they each have 30 rows of system metadata, that's a
lot of data, and most of it is the instance being repeated 30 times for
each metadata row. When we first released code doing this, a prominent
host immediately raised the red flag because their DB traffic shot
through the roof.


In the past I've taken a different approach to problematic one to
many relationships and have made the metadata a binary JSON blob.
Is there some reason that won't work? Of course, this type of thing
can run into concurrency issues on update, but these can be handled by
SELECT..FOR UPDATE + intelligent retry on deadlock. Since the metadata
is nearly always queried as a whole, this seems like a valid approach
that would keep DB traffic low but also ease the burden of reassembling
the collection in nova-api.


JSON blobs have the disadvantages that you are piggybacking an entirely 
different storage model on top of the relational one, losing all the 
features you might like about the relational model like rich datatypes 
(I understand our JSON decoders trip up on plain datetimes?), insert 
defaults, nullability constraints, a fixed, predefined schema that can 
be altered in a controlled, all-or-nothing way, efficient storage 
characteristics, and of course reasonable querying capabilities.   They 
are useful IMO only for small sections of data that are amenable to 
ad-hoc changes in schema like simple bags of key-value pairs containing 
miscellaneous features.





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] In memory joins in Nova

2015-08-12 Thread Mike Bayer



On 8/12/15 1:49 PM, Sachin Manpathak wrote:

Thanks, This feedback was helpful.
Perhaps my paraphrasing was misleading. I am not running openstack at 
scale in order to see how much the DB can sustain. My observation was 
that the host running nova services saturates on CPU much earlier than 
the DB does.
You absolutely *want* a single host to be saturated *way* before the 
database is; the database here is a single vertical service intended to 
serve hundreds or thousands of horizontally scaled clients 
simultaneously.A single request at a time should not even be a blip 
in the database's view of things.




Joins could be one of the reasons. I also observed that background 
tasks like instance creation, resource/stats updates contend with get 
queries. In addition to caching optimizations prioritizing tasks in 
nova could help.


Is there a nova API to fetch list of instances without metadata? Until 
I find a good way to profile openstack code, changing the queries can 
be a good experiement IMO.



On Wed, Aug 12, 2015 at 8:12 AM, Dan Smith d...@danplanet.com 
mailto:d...@danplanet.com wrote:


 If OTOH we are referring to the width of the columns and the join is
 such that you're going to get the same A identity over and over
again,
 if you join A and B you get a wide row with all of A and B
with a very
 large amount of redundant data sent over the wire again and
again (note
 that the database drivers available to us in Python always send
all rows
 and columns over the wire unconditionally, whether or not we
fetch them
 in application code).

Yep, it was this. N instances times M rows of metadata each. If
you pull
100 instances and they each have 30 rows of system metadata, that's a
lot of data, and most of it is the instance being repeated 30
times for
each metadata row. When we first released code doing this, a prominent
host immediately raised the red flag because their DB traffic shot
through the roof.

 In this case you *do* want to do the join in
 Python to some extent, though you use the database to deliver the
 simplest information possible to work with first; you get the
full row
 for all of the A entries, then a second query for all of B plus A's
 primary key that can be quickly matched to that of A.

This is what we're doing. Fetch the list of instances that match the
filters, then for the ones that were returned, get their metadata.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] In memory joins in Nova

2015-08-12 Thread Sachin Manpathak
Thanks, This feedback was helpful.
Perhaps my paraphrasing was misleading. I am not running openstack at scale
in order to see how much the DB can sustain. My observation was that the
host running nova services saturates on CPU much earlier than the DB does.
Joins could be one of the reasons. I also observed that background tasks
like instance creation, resource/stats updates contend with get queries. In
addition to caching optimizations prioritizing tasks in nova could help.

Is there a nova API to fetch list of instances without metadata? Until I
find a good way to profile openstack code, changing the queries can be a
good experiement IMO.



On Wed, Aug 12, 2015 at 8:12 AM, Dan Smith d...@danplanet.com wrote:

  If OTOH we are referring to the width of the columns and the join is
  such that you're going to get the same A identity over and over again,
  if you join A and B you get a wide row with all of A and B with a very
  large amount of redundant data sent over the wire again and again (note
  that the database drivers available to us in Python always send all rows
  and columns over the wire unconditionally, whether or not we fetch them
  in application code).

 Yep, it was this. N instances times M rows of metadata each. If you pull
 100 instances and they each have 30 rows of system metadata, that's a
 lot of data, and most of it is the instance being repeated 30 times for
 each metadata row. When we first released code doing this, a prominent
 host immediately raised the red flag because their DB traffic shot
 through the roof.

  In this case you *do* want to do the join in
  Python to some extent, though you use the database to deliver the
  simplest information possible to work with first; you get the full row
  for all of the A entries, then a second query for all of B plus A's
  primary key that can be quickly matched to that of A.

 This is what we're doing. Fetch the list of instances that match the
 filters, then for the ones that were returned, get their metadata.

 --Dan

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] In memory joins in Nova

2015-08-11 Thread Sachin Manpathak
Here are a few --
instance_get_all_by_filters joins manually with
instances_fill_metadata --
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782

Almost all instance query functions manually join with instance_metadata.

Another example was compute_node_get_all function which joined
compute_node, services and ip tables. But it is simplified  in current
codebase (I am working on Juno)




On Tue, Aug 11, 2015 at 3:09 PM, Clint Byrum cl...@fewbar.com wrote:

 Excerpts from Sachin Manpathak's message of 2015-08-12 05:40:36 +0800:
  Hi folks,
  Nova codebase seems to follow manual joins model where all data required
 by
  an API is fetched from multiple tables and then joined manually by using
  (in most cases) python dictionary lookups.
 
  I was wondering about the basis reasoning for doing so. I usually find
  openstack services to be CPU bound in a medium sized environment and
  non-trivial utilization seems to be from parts of code which do manual
  joins.

 Could you please cite specific examples so we can follow along with your
 thinking without having to repeat your analysis?

 Thanks!

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] In memory joins in Nova

2015-08-11 Thread Chris Friesen
Just curious...have you measured this consuming a significant amount of CPU 
time?  Or is it more a gut feel of this looks like it might be expensive?


Chris


On 08/11/2015 04:51 PM, Sachin Manpathak wrote:

Here are a few --
instance_get_all_by_filters joins manually with
instances_fill_metadata --
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782

Almost all instance query functions manually join with instance_metadata.

Another example was compute_node_get_all function which joined compute_node,
services and ip tables. But it is simplified  in current codebase (I am working
on Juno)




On Tue, Aug 11, 2015 at 3:09 PM, Clint Byrum cl...@fewbar.com
mailto:cl...@fewbar.com wrote:

Excerpts from Sachin Manpathak's message of 2015-08-12 05:40:36 +0800:
 Hi folks,
 Nova codebase seems to follow manual joins model where all data required 
by
 an API is fetched from multiple tables and then joined manually by using
 (in most cases) python dictionary lookups.

 I was wondering about the basis reasoning for doing so. I usually find
 openstack services to be CPU bound in a medium sized environment and
 non-trivial utilization seems to be from parts of code which do manual
 joins.

Could you please cite specific examples so we can follow along with your
thinking without having to repeat your analysis?

Thanks!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] In memory joins in Nova

2015-08-11 Thread Clint Byrum
Excerpts from Sachin Manpathak's message of 2015-08-12 05:40:36 +0800:
 Hi folks,
 Nova codebase seems to follow manual joins model where all data required by
 an API is fetched from multiple tables and then joined manually by using
 (in most cases) python dictionary lookups.
 
 I was wondering about the basis reasoning for doing so. I usually find
 openstack services to be CPU bound in a medium sized environment and
 non-trivial utilization seems to be from parts of code which do manual
 joins.

Could you please cite specific examples so we can follow along with your
thinking without having to repeat your analysis?

Thanks!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] In memory joins in Nova

2015-08-11 Thread Dan Smith
 Here are a few --
 instance_get_all_by_filters joins manually with 
 instances_fill_metadata --
 https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890
 https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782
 
 Almost all instance query functions manually join with instance_metadata.

This was done because joining metadata with instances was resulting in
very large result sets from the database. Instance objects are very
large, and each can have many rows in the metadata tables (especially in
older versions of nova). When joining these tables and doing queries for
all instances on a host or across a time interval, the result sets are
very large (we had a big issue with this at the grizzly release, as
reported by real deployments). So, we're trading some overhead (and
atomicity) for significantly lower DB traffic.

I don't think this is likely to be the source of your measured CPU
overhead though. The metadata joining is O(n) where n is the number of
instances in the result. I would think that is much smaller than the
overhead of processing the result of the query into ORM objects.

--Dan



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] In memory joins in Nova

2015-08-11 Thread Sachin Manpathak
I am struggling with python code profiling in general. It has its own
caveats like 100% plus overhead.
However, on a host with only nova services (DB on a different host), I see
cpu utilization spike up quickly with scale. The DB server is relatively
calm and never goes over 20%. On a system which relies on DB to fetch all
the data, this should not happen.

I could not find any analysis of nova performance either. Appreciate if
someone can point me to one.

Thanks,





On Tue, Aug 11, 2015 at 3:57 PM, Chris Friesen chris.frie...@windriver.com
wrote:

 Just curious...have you measured this consuming a significant amount of
 CPU time?  Or is it more a gut feel of this looks like it might be
 expensive?

 Chris


 On 08/11/2015 04:51 PM, Sachin Manpathak wrote:

 Here are a few --
 instance_get_all_by_filters joins manually with
 instances_fill_metadata --

 https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890

 https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782

 Almost all instance query functions manually join with instance_metadata.

 Another example was compute_node_get_all function which joined
 compute_node,
 services and ip tables. But it is simplified  in current codebase (I am
 working
 on Juno)




 On Tue, Aug 11, 2015 at 3:09 PM, Clint Byrum cl...@fewbar.com
 mailto:cl...@fewbar.com wrote:

 Excerpts from Sachin Manpathak's message of 2015-08-12 05:40:36 +0800:
  Hi folks,
  Nova codebase seems to follow manual joins model where all data
 required by
  an API is fetched from multiple tables and then joined manually by
 using
  (in most cases) python dictionary lookups.
 
  I was wondering about the basis reasoning for doing so. I usually
 find
  openstack services to be CPU bound in a medium sized environment and
  non-trivial utilization seems to be from parts of code which do
 manual
  joins.

 Could you please cite specific examples so we can follow along with
 your
 thinking without having to repeat your analysis?

 Thanks!


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev